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Preface 



These are the proceedings of ECSQARU 2001, the Sixth European Conference 
on Symbolic and Quantitative Approaches to Reasoning with Uncertainty held 
in Toulouse, France, on September 19-21, 2001. The series started ten years ago 
in Marseilles, host of ECSQARU 1991, and went to Granada (ECSQARU 1993), 
Fribourg (ECSQARU 1995), Bad Honnef (ECSQARU/FAPR 1997), and London 
(ECSQARU 1999). 

In addition to the contributed papers (selected from over a hundred submissi- 
ons from 23 countries), the scientific program of ECSQARU 2001 included three 
invited talks: H. Geffner, F. V. Jensen, and T. Schaub. We would like to thank 
Patrice Perny and Alexis Tsioukias for organizing a special session on decision, 
and Rui Da Silva Neves for organizing a session on studies about uncertainty 
from the point of view of psychology. All papers in these two sessions have gone 
through the regular reviewing process and are included in this volume. 

Moreover, three workshops were held prior to the conference itself: “Ma- 
nagement of uncertainty and imprecision in multimedia information systems” 
(Mohand Boughanem, Fabio Crestani, Gabriella Pasi), “Spatio-temporal reaso- 
ning and geographic information systems” (Robert Jeansoulin, Odile Papini), 
and “Adventures in argumentation” (Anthony Hunter, Simon Parsons). Also, 
ECSQARU 2001 was co-located with ESI 2001, the Euro Summer Institute on 
Decision Analysis and Artificial Intelligence. 

We are most grateful to the members of the program committee and all 
the additional reviewers for their work. We are indebted to all the members 
of the organizing committee for their support, which included setting up and 
maintaining Web pages. We would especially like to thank Colette Ravinet, 
Jean-Pierre Baritaud, and Max Delacroix for their help. 
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1 Introduction 

Over the last decade, graphical models for computer assisted diagnosis and deci- 
sion making have become increasingly popular. Graphical models were originally 
introduced as ways of decomposing distributions over a large set of variables. 
However, the main reason for their popularity is that graphs are easy for hu- 
mans to survey, and most often humans take part in the construction, test, and 
use of systems for diagnosis and decision making. In other words, at various 
points in the life cycle of a system, the model is interpreted by a human or com- 
municated between humans. As opposed to machine learning, we shall call this 
activity human interacted modeling. In this paper we look at graphical models 
from this point of view. We introduce various kinds of graphical models, and the 
comprehensibility of their syntax and semantics is in focus. 



2 Belief Graphs 

We are faced with a particular part of the world. We already have some infor- 
mation that leads us to certain beliefs, and when we get new information, we 
update these beliefs. We organize the world into a set of variables. The possible 
values of variables are called states. The state set of variable A is denoted sa. 

Our belief is quantified as a set of real numbers. There are several belief cal- 
culi. Most prominent are probability calculus, fuzzy logic, and belief functions. 
The examples in this paper are taken from probability calculus, but the consid- 
erations are not restricted to this. The uncertainty calculus is expressed through 
potentials with sets of variables as their domains: for a set X of variables, the 
probability calculus defines a space, sp{X), and a potential over X is a real- 
valued function over sp{X). For probability calculus, .sp{X) is the Cartesian 
product of the state sets, and for belief functions, sp{X) is the power set over 
this Cartesian product. 

Definition 1. A belief graph is a pair (r,<P). F is a graph (fl,A) with Q being 
a set of variables and A a set of links. Links may be directed or undirected. <P is 
a set of potentials. The domains of the potentials are subgraphs of A. 
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Definition 2. The operations for potentials of belief graphs are combination, 0, 
and projection, The joint potential for a belief graph (T,<P) with T = (17, A) is 
defined as Bel{f2) = ®T>, the combination of all potentials. For a set of variables 
X, the marginal belief of X is defined as Bel{X) = Bel{^2)^^ . 

Evidence on a set of variables y is a potential with the domain Y . A par- 
ticular kind of evidence is a potential identifying the state of a single variable. 

A breakthrough for the application of belief graphs was the construction of 
efficient algorithms for the calculation of marginal beliefs with evidence included 
in the set of potentials. [16] and [6] give sets of axioms on combination and pro- 
jection that ensure the correctness of these algorithms. The updating algorithms 
are not the issue in this paper. 

For graphs, we use the following terminology: the parents pa{A) denote the 
set of variables having a directed link to A, and the family fa{A) is pa{A) 
extended with A. 

As mentioned above, there are several categories of belief graphs. Each cate- 
gory is characterized by type of graph and potentials permitted and with respect 
to the meaning of the various parts of the model. 

Definition 3. A belief graph category is a set of syntactic rules specifying a 
family of belief graphs. 



Example 1. Bayesian networks is a belief graph category meeting the following 
syntactic constraints: 

— r = (17, A) is a directed acyclic graph. 

— There is exactly one potential (pA for each A G 17. 

— dom{pA) = fa{A). 

— sp{X) = XAexSA- 

— pa(c) G [0, 1] for each c G .sp(fa(A)). 

EaPA Ipa(A) ■ 

Figure 1 gives an example of a Bayesian network. The last five of the preced- 
ing rules ensure that the potentials to be specified for a Bayesian network are 
the conditional probabilities for each variable given its parents. In an axiomatic 
setting, the last rule is the unit rule 

ipa{A) _ 

Ta -*-pa(A)> 

where lx is the unit potential with domain X . 



Example 2. Markov networks is a belief graph category meeting the following 
syntactic constraints: 

— r = (17, A) is an undirected graph. 

— The domains of potentials are complete subgraphs of E . 

— sp{X) = X^gxSA- 
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Fig. 1. A Bayesian network. The potentials to be specified are the following conditional 
probabilities: P{A), P{B), P{C \ A,B),P{I \ B),P{H \ C),P{G \ C,H,E),P{E \ 
C,I),P{L\H),PiJ\G),P{K\E,J). 



— Each pair of neighbours in the graph appears at least once in the same 
domain of a potential. 

— The potentials have non-negative real values. 

Figure 2 gives an example of a Markov network. 

When a belief graph of a certain category is used for modeling, a domain 
expert will determine a graph F representing the domain in sufficient detail. 
Therefore, the semantics of the graph must be easy to understand and to com- 
municate. Likewise, it must be evident from F what potentials to determine. In 
other words, when F has been determined, the number of potentials to be spec- 
ified will be easily read from it, as will their domains. The strength of graphical 
models lies in this point. A graph is easy for humans to read and draw, and 
when the meaning of the variables and links is easy to understand, belief graphs 
are very well suited for interpersonal communication. When constructing and/or 
checking a model, we need comprehensible syntax to ensure easy determination 
of whether the model in question satisfies the syntax - with regard to struc- 
ture as well as to potentials. This holds for both Bayesian networks and Markov 
networks. 

Furthermore, we also require the semantics to be comprehensible. The se- 
mantics of a directed link in a belief graph is causal impact and the semantics 
of an undirected link is covariance. Although causality in many ways is a dif- 
ficult concept to work with, we claim that it is much better understood than 
covariance and, from a practical point of view, it is much easier to have a mean- 
ingful dialogue with a domain expert on causal impact than on covariance. The 
semantic concepts causal impact and covariance are made operational through 
the concept of observational independence. 

Definition 4. Two variables A and B are observationally independent if no 
information on A will change the belief on B, and vice versa. 

We omit the term “observational” when obvious from the context. Observa- 
tional independence is often conditioned on specific knowledge, and we talk of 
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Fig. 2. A Markov network. The potential domains are subsets of the following sets: 
{A, B, C}, {B, C, I}, {C, E, I}, {C, E, G, H}, {E, G, J}, {E, J. K}. 



conditional independence: A and B are independent given the set X if they are 
independent whenever the state of each variable in X is known. 

The semantics of the graphical structure yields a set of conditional inde- 
pendence properties and this is the basis for the belief calculation algorithms. 
In general, these properties are called Markov properties. Thus, the request for 
comprehensible semantics has been transformed to a request for comprehensible 
Markov properties. 

Markov Property for Markov Networks (Information Blocking) 

Two distinct variables A and B are independent given X if all paths between 
A and B contain an intermediate variable from X. 

The Markov property for Markov networks is very easy to read and the 
questions to be asked the domain expert are usually comprehensible. 

Markov Property for Bayesian Networks (d-separation) 

Two distinct variables A and B are independent given X if all paths between A 
and B contain an intermediate variable C such that either 

— the connection is serial or diverging and C G X 



or 



— the connection is converging and neither C nor any of its descendants are in 

X. 

The causal interpretation of directed links yields another property that can 
be used to test whether you agree with the model. If you force the state of a 
variable through an external intervention, this does not change your belief on its 
parents, nor does it introduce dependence among them. This kind of conditioning 
is called conditioning by intervention. 

Note that the preceding Markov properties are consequences of the semantics 
of the structure and they are not related to the specific calculus for uncertainty. 
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Fig. 3. Serial, diverging, and converging connections. 



In an axiomatic setting, the basis for the rule on converging connections is the 
unit rule. 



2.1 Chain Graphs 

Bayesian networks and Markov networks represent two extreme types of graphs 
in which all links are either directed or undirected. We will now consider the 
more general class where we are allowed to mix the types of link. 

Definition 5. Let F = {Q,A) he a graph. A partially directed path in F is 
a path with at least one directed link such that all directed links have the same 
direction. F is acyclic if it has no partially directed cycles. 

The chain components of F is the set of connected graphs obtained by re- 
moving all directed links from F. Let i? be a chain component, the parent set 
of A, pa{d), is the set of nodes with a directed link into a node in d (see Fig- 
ure 4) . The family graph for chain component d is the subgraph of F containing 
d Upa{d). The moral family graph is the graph obtained from the family graph 
by introducing a link between all members of paid) and removing all directions. 




Fig. 4. A mixed graph. The non-singleton chain components are {C,E,H,G} with 
parents {A,B,I} and {J,K} with parents {G,E}. 
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Let be a chain component with parents pa{'&). The cliques in the moral 
family graph we call local factor domains. Note that for a directed graph, the 
local factor domains are the families fa{A). 

Example 3. Chain graphs is a belief graph category meeting the following syn- 
tactic constraints: 

— r = (17, yl) is an acyclic graph. 

— There is a potential ps for each local factor domain 5 in each moral family 
graph. 

— dom{ps) = 5. 

— ps(c) £ [0, 1] for each c G sp(d). 

— Let A be the local factor domains of the chain component d and let = 

Then E.^p^ — lpa(i 9 )- 

In an axiomatic setting, the last rule should be the following version of the 
identity rule 

The chain graph category is nice and broad, encompassing both Bayesian 
networks and Markov networks. However, the syntax is hardly comprehensible. It 
is possible to test whether a graph is acyclic, but when it comes to the potentials, 
problems arise. The syntactic constraints on potentials may be interpreted so 
that the p.^s are conditional probabilities P(d | pa(d)), but it is hard to imagine 
a domain expert being happy to provide the potentials for a chain graph. 

Things do not get easier when considering the semantics. The causal in- 
terpretation of directed links is maintained and the interpretation of a family 
must generalise the causal interpretation. A possible interpretation of undirected 
links is that they are due to latent variables. A latent variable may have been 
marginalized out or it may have been conditioned out (see Figure 5). 




Fig. 5. (a) A mixed model that may be the result of marginalizing F out of (b) or 
conditioning G out of (c). 



Unfortunately, this interpretation introduces ambiguity. As shown in Fig- 
ure 5, the chain graph (a) can be the result of marginalizing F out from (b) or 
of conditioning G out from (c) . The two possible origins will cause different con- 
ditional independence properties. If Figure 5 (a) is the result of marginalizing F 
out of Figure 5 (b), A and B will be independent, but this is not the case if the 
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graph is the result of conditioning in Figure 5 (c). Accordingly, the potentials to 
be specified for (a) depend on the origin of the undirected link. 

So, the chain graph category is not the appropriate one for representing 
causal models with latent variables. If the model builder is faced with covariance 
without causal direction, she has to analyse the situation in order to find out 
whether the covariance is due to latent variables and, if so, to include them 
temporarily in the model. 

However, undirected covariance may also be caused by a feed-back mech- 
anism. [7] gives an analysis of a causal interpretation of chain graphs where 
a chain component is interpreted as a feed-back complex. This interpretation 
yields the Markov properties proposed by [8] and [4] . 

Markov Property for Chain Graphs (The Global Chain Markov Prop- 
erty) 

Let {X, y, Z) be three disjoint sets of variables in a chain graph F and let G be 
the moral graph of the smallest ancestral set containing X UY U Z. 

Then X is independent of Y given Z if any path in G between X and Y 
contains a member of Z (see Figure 6). 

[2] give a separation criterion in the style of d-separation, but this is too in- 
volved for this short presentation. We conclude that chain graphs will for several 
years stay a modeling language for specialists only as it has incomprehensible 
syntax and semantics. 





Fig. 6. (a) A chain graph; (b) The moral graph of the smallest ancestral set containing 
{A, E, G, 1}', (c) The smallest ancestral set containing {A,E,G}. We see that E and 
G are independent given A, but they are not independent given A and I. 




F.V. Jensen 



2.2 Computational Aspects 

The focus of interest in this paper is on how well suited various categories of 
graphical models are for human interacted model building. As a model-builder, 
I may not at first care about the calculation. What is essential is to have a com- 
prehensible language for specifying my problem in a precise manner. When the 
specification is finished, I ’’press a button” and the computer runs an algorithm 
made up by some smart guys, and this algorithm calculates the answer. In this 
way, the specification is also a kind of formal programming language. When the 
program has been constructed to meet the syntactic requirements, the computer 
can run it and will after a while give me an answer. If I have not been careful, the 
computer may run for too long. In the case of general programming languages, 
I risk never to get an answer, and I may have no prior idea on how much time 
or space the computer will require. 

Belief updating is NP-hard, so the risk is that the task is too complicated 
for the computer. However, this can be analysed off-line. When the belief graph 
has been specified, I — or rather the computer — can pre-compile the program 
and report on how complex the task is. 

3 Decision Graphs 

Very often the reason for calculating beliefs is to utilize them in making a decision 
on some actions. This may also be represented in the graphical model. There 
are two types of actions, interventions and observations. An intervention is an 
action that may change the state of the world, whereas an observation only 
yields information on it. The crank for choosing actions is utilities. A utility is a 
real value expressing the value of a certain configuration of a set of variables. We 
assume that there is a way to combine beliefs and utilities in calculating expected 
utilities, and the task is to determine a strategy maximising the expected utility. 

3.1 Influence Diagrams 

In influence diagrams you do not distinguish between intervention decisions and 
observation decisions. An influence diagram is a DAG with three types of vari- 
ables, chance variables (usually represented as circular or elliptic nodes), deci- 
sion variables (usually represented as rectangular nodes), and utility variables 
(usually represented as diamond shaped nodes). 

Apart from being a DAG, an influence diagram must satisfy two other struc- 
tural constraints: 

— utility nodes have no children. 

— there is a directed path comprising all decision nodes (the sequencing con- 
straint). 

A chance node is said to be barren if none of its descendants are decision nodes 
or utility nodes. A barren node has no impact on the expected utility of any 
decision and plays no role in the decision analysis. Therefore, it is customary to 
add the following cosmetic syntactic constraint: 

— there are no barren nodes 
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There are two types of potentials attached to influence diagrams, chance po- 
tentials and utility potentials. A utility potential for a utility node is a real valued 
function with domain pa{U). The domain rules for chance potentials are similar 
to the rules for belief graphs, and apart from the operations for belief poten- 
tials, there is an operation for combining chance potentials and belief potentials. 
This is treated in more detail by [17] under the term valuation networks. The 
term “influence diagram” is reserved to valuation networks based on Bayesian 
networks, where the combination of probabilities and utilities is the usual ex- 
pectation operation. Figure 7 gives an example of an influence diagram. 




Fig. 7. An influence diagram. The utility potentials have the domains 
{Di}, { 1 ) 2 }, {L> 3 , A}, {I/}, respectively. The probability potentials to be specified are 
for each variable A, P(A | pa(A)). 



Altogether, the syntax for influence diagrams is comprehensible as is their 
semantics. Directed links into chance variables are causal, links into utility vari- 
ables represent functional dependence, and links into decision variables represent 
temporal precedence. In other words, if the variable A is a parent of the decision 
variable D, the state of X is known at the time of deciding on D. Furthermore, 
no-forgetting is assumed. Everything known at some point of decision is also 
known later. The sequencing constraint provides a temporal sequencing of the 
decisions, and due to no-forgetting, an influence diagram specifies precisely what 
is known at each point of decision. 

A policy for a decision variable D in an influence diagram is a function that 
for each configuration of the past yields a state of D. A strategy for an influence 
diagram is a set of policies, one for each decision variable. An optimal strategy 
is a strategy yielding maximal expected utility. We have algorithms well suited 
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for calculating an optimal strategy for a specified influence diagram ([17], [5], 

[14]). 

The algorithms exploit dynamic programming by passing through the influ- 
ence diagram in reverse temporal order. For each decision, an optimal policy is 
determined and altogether these policies form an optimal strategy. The complex- 
ity of solving an influence diagram is in general higher than for a belief graph. 
One of the crucial complexity issues is the size of the domains for the optimal 
policies. The domain need not be the entire past, and the algorithms remove 
some irrelevant past. 

A question of interest in connection with influence diagrams is the relevant 
past for a decision variable. A variable X belongs to the relevant past of the 
decision variable D if the state of A is known at the time of deciding on D and 
if the actual state may have an impact on the optimal decision. A variable X 
from the past is said to be structurally irrelevant for D if the actual state of X 
has no impact on the optimal decision for D, no matter the potentials attached. 
We have efficient algorithms for determining structural relevance for all decision 
variables in an influence diagram ([15], [10], [12]). For the influence diagram in 
Figure 7, the structurally relevant pasts for Ui,I?2 and are {B},{B,Di} 
and {G,E}, respectively. The reason why both B and D\ are relevant for D 2 is 
that all three variables influence the utility U 3 . 

3.2 Extensions of Influence Diagrams 

The advantage of using influence diagrams when calculating optimal strategies 
is that the sequencing of decisions is determined. However, this constraint is 
unnecessarily strict. For example, two neighboring decisions with no intermediate 
observation can always be swapped — just like it happens that two decisions 
may be made independently of each other. So we can make the syntax less 
strict by removing the sequencing constraint. The ensuing category is called 
partial influence diagrams ([10]). Figure 8 shows an example of a partial influence 
diagram where D\ and D 2 precede D3, but nothing is specified concerning the 
order of Di and D2. 

Partial influence diagrams have comprehensible syntax and semantics, and 
it is easy to read the partial temporal order from the graph. However, they are 
defective in that their decision scenario may be ambiguous. A decision scenario is 
ambiguous if two extensions of the partial temporal order yield different optimal 
strategies. A partial influence diagram is accordingly said to be well-defined if 
all extensions of the partial temporal order yield the same optimal strategy. [10] 
provides what the authors claim to be the weakest set of structural syntactic 
rules ensuring a well-defined partial influence diagram. The rules consist of a 
systematic search through linear extensions of the partial order, for each of them 
investigating whether they yield the same set of relevant pasts. The syntactic 
definition of well-defined partial influence diagrams is not comprehensible, and 
these diagrams are not really candidates for human interacted model building. 

If a partial influence diagram is well-defined, the classical algorithms from 
influence diagrams can be used. The ambiguity of partial influence diagrams 
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Fig. 8. A partial influence diagram. 



refers to the use of dynamic programming algorithms yielding different strategies 
for different orders extending the partial order. 

You may consider partial influence diagrams in a different way. A partial 
influence diagram specifies a decision scenario where the task not only is to 
calculate an optimal strategy, but an essential part of it is to calculate an op- 
timal sequence as well. When considered this way, partial influence diagrams 
have comprehensible syntax and semantics. The problem is the computational 
complexity. When solving a partial influence diagram, there is a risk that you 
must solve an influence diagram for each linear extension of the partial temporal 
order. The work by [10] can be exploited to reduce the number of linear orders 
to be investigated. For example, a simple rule is that observations are placed 
as early as possible in the order. Not much work has been carried out in this 
direction, and certainly there are many simplifications to be found. 

To represent decision scenarios with non-fixed temporal order, we propose 
another category that we call D-models. A D-model has as basis a Bayesian 
network and here we distinguish between intervention variables (rectangular) 
and observation variables (triangular). A D-model is a directed acyclic graph 
over chance variables, intervention variables, binary observation variables, and 
utility nodes, meeting the following constraints (see Figure 9): 



— utility nodes have no children. 

— observation variables have only chance variables as parents and utility nodes 
as children. 

— intervention variables have no parents. 
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Fig. 9. A D-model. 



The potentials for D-models are as for influence diagrams. There are no 
potentials attached to observation variables. 

The semantics of observation nodes is the option of observing. By “observing” 
is meant that the state of each parent variable is determined. A non-perfect ob- 
servation is represented through introduction of an intermediate chance variable 
(See Figure 10). 




Fig. 10. Representation of a non-perfect observation of A. 



There is one more issue to clear up. Consider, for example, the decision 
variable D\ in Figure 9. It indicates that a decision can be made that together 
with B has an impact on I. What is not clear is whether the variable / exists 
before a decision on D\ is made. D\ may be a decision to fix the temperature, 
and in this case there will be a temperature influenced by B also before an 
action from D\ has been performed. If, on the other hand, the decision is to 
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start some process with a possible outcome represented by I, “the state of I” 
has no meaning before Di is instantiated. 

The problem is resolved by also reading a temporal order into the model: a 
successor to a decision variable D cannot be observed until an action from D 
has been performed. For Figure 9, this means that decisions on Di and D 2 are 
made before an observation of G is possible. 

As for belief graphs, we wish to use the semantics to derive rules for 
conditional independence for D-models. 

Markov property for D-models 

Two distinct variables Y and Z are independent given X if for all paths between 
them there is an intermediate variable / such that either 

— the connection is serial or diverging and I G X 



or 

— the connection is converging and neither I nor any of its descendants are in 

X. 

Due to the direction of the link to an observation node, the Markov property 
for D-models reflects the difference between conditioning by observation and 
conditioning by intervention: if the state of a variable A is known due to an 
intervention, this has no impact on our belief of A’s parents; but if the state is 
known due to an observation, then it may have an impact on our beliefs of the 
variable’s parents. 

The direct representation of observation decisions is not (yet) customary. 
Instead, observation nodes are transformed into intervention decisions. Let Ta 
be an observation on the variable A. Add a chance variable A' with a state no — t 
and the states of A. Add an intervention action Da- A! is a child of A and Da- 
If Da = j/es. A' is in the same state as A, and if Da = no, A' = no — t (see 
Figure II). Note that after the transformation, the Markov property does not 
hold. 





Fig. 11. Representation of observation decisions in influence diagrams. 



A D-model represents a decision scenario for which the task is to find a 
sequencing with an optimal strategy. That is, a D-model can be expanded to 
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a set of influence diagrams. For example, the sequence {Tb, Ui, U 2 , 7g, U 3 } is 
represented by the influence diagram in Figure 7, and the sequence in which 
G is not observed is represented by the same influence diagram with the link 
from G to U 3 removed. Structure analysis similar to the one for partial influence 
diagrams can reduce the number of influence diagrams considerably. Also, the 
decomposition of influence diagrams presented by [9] can be exploited to save 
repeated computations, but not much work has been performed with respect to 
efficient algorithms for solving D-models. It should be noted that in general it is 
NP-hard to determine an optimal sequence ([19]). 

D-models can be extended with precedence links analogous to information 
links in influence diagrams. Precedence links can indicate that some decisions 
must be made in a specific order and they reduce the search for an optimal 
sequence. 

So far, the decision scenarios modelled have been symmetric: the decision 
variables and their states have been independent of the past. Various languages 
for representing asymmetric decision scenarios have been proposed ([13], [3], [1], 
[18], [ 11 ]), but we shall not go into this issue here. 
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We consider the problem of planning in a general setting where actions can be deter- 
ministic or probabilistic, and their effects can be fully or partially observable. The task 
is to obtain a plan or closed-loop controller given a suitable description of the initial 
situation, actions, and goals. 

We approach this problem by distinguishing three elements: 

- models (that help us to make the tasks mathematically precise) 

- languages (that help us to state problems in a convenient way), and 

- algorithms (that compute the desired solutions: plans, controllers, etc.) 

We show that the models - State Models, Markov Decision Processes (MDPs) and 
Partially Observable MDPs - can be conveniently described using suitable logical action 
languages, and in many cases can be solved by search algorithms that combine ideas 
from heuristic search and dynamic programming. We present empirical results over a 
number of domains and discuss limitations and challenges. 

The talk is mostly self-contained and relevant material (papers, software, slides) can 
be found at: http : //www. Idc .usb . ve/ hector 



* This is joint work with Blai Bonet. 
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We introduce a methodology and framework for expressing general preference infor- 
mation in logic programming under the answer set semantics. At first, we are interested 
in semantical underpinnings for existing approaches to preference handling in extended 
logic programming. To begin with, we explore three different approaches that have been 
recently proposed in the literature. Because these approaches use rather different formal 
means, we furnish a uniform characterizations that allows us to gain insights into the 
relationships among these approaches. 

We then draw on this study for furnishing implementation techniques. In the resulting 
framework, an ordered logic program is an extended logic program in which rules are 
named by unique terms, and in which preferences among rules are given by a set of 
atoms of the form s ^ t where s and t are names. Such an ordered logic program is 
transformed into a second, regular, extended logic program wherein the preferences are 
respected, in that the answer sets obtained in the transformed program correspond with 
the preferred answer sets of the original program. Our approach allows the specification 
of dynamic orderings, in which preferences can appear arbitrarily within a program. 
Static orderings (in which preferences are external to a logic program) are a trivial 
restriction of the general dynamic case. We develop a specific approach to reasoning 
with prescriptive preferences, wherein the preference ordering specifies the order in 
which rules are to be applied. 

Since the result of our translation is an extended logic program, we can make use 
of existing implementations, such as div and smodels. To this end, we have devel- 
oped the so-called pip compiler, available on the web at url http://www.cs.uni- 
potsdam.de/~torsten/plp/, as a front-end for these programming systems. 
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Abstract. We present in this paper an attempt to deal with ordinal 
information in a strict ordinal framework. We address the problem of 
ranking alternatives in a multiple criteria decision making problem by the 
use of a compensatory aggregation operator, where scores are given on a 
finite ordinal scale. Necessary and sufficient conditions for the existence 
of a representation are given. 



1 Introduction 

In many practical problems, one has often to deal with non numerical, quali- 
tative information, coming from human agents, decision makers, or any source 
providing information in natural language, etc. If this addresses in the large the 
problem of modelling knowledge, we address here more particularly the problem 
of dealing with ordinal information, that is, information given on some ordinal 
scale, i.e. a scale where only order matters, and not numbers. For example, a 
scale of evaluation of a product by a consumer such as 

l=bad, 2=rather bad, 3=acceptable, 4=more or less good, 5=good 

is an ordinal scale, despite the coding by numbers 1 to 5. In fact, these numbers 
are meaningless since one could have defined other numbers as well: 

-23=bad, 2=rather bad, 31=acceptable, 49=more or less good, 50=good. 

These numbers act more as labels than as true numbers, i.e. where usual arith- 
metical operations have a meaning. The consequence is that any manipulation 
of these numbers is forbidden, since meaningless, unless these manipulations in- 
volve only order: computing the arithmetic mean, the standard deviation over a 
population are forbidden operations, but the median, as well as any order statis- 
tic is permitted. These considerations pertains to measurement theory (see e.g. 
the book of Roberts [3]), and have influenced statistics, introducing the notion 
of permissible transformations of data (see however [5] for a criticism of this part 
of statistics). 

* The long version of this paper with all proofs is available as a working paper. 

** On leave from THALES, Corporate Research Laboratory, Domaine de Corbeville, 
Orsay, France 
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If many powerful tools exist when information is quantitative (or cardinal), 
the practitioner is devoid of adequate tools in front of problems involving ordi- 
nal information. Most often, people perform an arbitrary mapping on a (true) 
numerical scale, to come back to the cardinal world, and then perform usual 
operations. But as it was said above, any operation which is not restricted to 
order manipulation is meaningless. 

In this paper we focus on decision making under multiple criteria (MCDM): 
alternatives, or acts, are evaluated on some (ordinal) scale E with respect to 
several criteria. We assume here that the evaluations are done on the same 
common scale, representing the degrees of satisfaction (or scores) of the decision 
maker for the concerned act, with respect to the different criteria. The problem 
of interest is to represent the preference ^ of the decision maker over a set of 
acts A by some mapping u over A to some (again, ordinal) scale E' , i.e., for any 
a,b € A, 



a ^ 6 if and only if u{a) > u{b) 

where > denotes the ordering on E' . In other words, we would like to mimic 
standard multiattribute utility theory in a purely ordinal framework. 

Assuming mild conditions on the kind of mapping which are standard in 
multicriteria decision making, this paper gives necessary and sufficient conditions 
on E, A, and ^ to have such a representation. 

A last remark is in order here. Although we have chosen the framework of 
multicriteria decision making, our work can be applied as well to decision under 
uncertainty with a finite set of states of the world, since a set of scores oi, . . . , o„ 
of a w.r.t. to n criteria can be assimilated to a set of utilities of a in different 
states of the world. 

2 Multicriteria Decision Making in an Ordinal Context 

Let Xi, . . . ,Xn be a set of attributes or descriptors, criteria, of a set of acts 
of interest. The Cartesian product X = Ai x • • • x X„ represents the set of all 
possible acts, and we consider A a subset of X. We assume that we are able 
to assign to each a G A a vector of scores (oi, . . . , o„), where represents the 
degree of satisfaction of the decision maker for act a with respect to criterion 
i. These scores are all given on a common finite ordinal scale E = {ci < • • • < 
Cfc}. The assumption of commensurateness is made, i.e. if Oi = ei = aj, the 
satisfaction of the decision maker is the same for Oi and aj, although Oi and Uj 
pertain to different criteria. This assumption is a strong one and should deserve 
a careful study. However, in this paper, we concentrate on aggregation of scores, 
leaving aside their construction. Anyway, in decision under uncertainty, this 
problem of commensurability vanishes. The ordering on E is denoted >. 

We assume that the decision maker can express his/her preferences on A 
under the form of a binary relation ^ on A x A, being reflexive, transitive and 
complete. We denote as usual a ~ 6 if a ^ 6 and b)p a hold, and a >- b A a b 
and -<{b a). 
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The representation problem we address here is the following: 

Find an ordinal scale E' with an order >e' and a mapping u : A — > E' 
such that for any a,b € A, a b if and only if u{a) >e' u{b). 

We propose the following construction for w. 



u = f oFL 



( 1 ) 



where FL : if” — > E is an aggregation operator, and / : E — > E' defines the 
scale E' . The aggregation operator assigns to every vector of scores (ai, . . . , a„) 
a global score on the same scale. Aggregation operators in the cardinal (numer- 
ical) case have been studied at length in the fuzzy logic community, for various 
applications in decision making (see a survey in e.g. [1]). Generally speaking, 
aggregation operators in the field of decision making should satisfy at least two 
fundamental properties, which leads to the following definition. 

Definition 1. An operator FI : if" — > E is a compensatory operator if it 
satisfies: 

(i) limit condition: 

n „ 

minui < FL{ai, . . . ,an) < maxoi, V(ai, . . . ,a„) G if". 

i—1 2=1 

(ii) monotonicity: Vz G {1, . . . , n}, Oi > a' implies 

Fi{ai , . . . , Qi—i, Oi, Uj-i-i, . . . , a^i) ^ FLi,a \, . . . , n^—i, a^, aiEi, • ■ • ? ^n)- 

The operator is said to be weakly compensative if only (i) holds^ . 

The first property says that the global evaluation should not be beyond the 

scores on criteria, while the second one ensures that an improvement on one 

criterion cannot decrease the global score. In the sequel, with a slight abuse of 
notations, we will write FL{a) instead of FL{a \, . . . , a„), for any a G A. 

We call (A, ^,if) the decision profile of the decision maker, that is, the set 
of all vectors of scores (oi,... ,a„),a G A, expressed on if, together with the 
preference relation. 

Definition 2. (i) The decision profile is coherent if for no pair of acts a,b, 
we have both a>- b and Oi < bi for all i in {1, . . . , n}. 

(ii) The decision profile is weakly coherent if there is no a,b in A such that 

a>- b and max"_j^ Oi < min(h]^ bi. 

Obviously, coherence implies weak coherence. 

Let us denote by Ai, . . . , Ak> the equivalence classes of i.e. Vi, Va, b G Ai, 
a ^ b. We number them in such a way that Va G Ai,Va' G Aj, a y a' i > j- 
Obviously, we need at least k' degrees to represent this order, so that E' should 
have at least k' degrees. 

^ Usually, min and max are excluded from the class of compensative operators, so our 
definition is slightly different from the common one. 
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— If fc' = k, we can choose E' = E, so that / must be the identity mapping. 

— If fc' < k, then several degrees of E will map on the same degree of E' . In 
other words, / is a surjective non decreasing mapping, which induces by/“^ 
a partition of E. E' can be then considered as the result of partitioning E. 

— \i k' > k, the original scale E is not enough fine to represent the preference, 
and some degrees have to be added. 

Thus, the condition k' < k appears as a first necessary condition to represent 
the preference on the given fixed scale E, i.e. we should have at least as many 
degrees on E than equivalence classes of In this paper, we look for additional 
necessary and sufficient conditions in order to represent the preference. 



3 Preference Representation by a Compensatory 
Operator 

In this section, we try to find necessary and sufficient conditions in order to 
have a representation of the preference relation by a compensatory operator, 
assuming k' < k. 



3.1 Preliminary Definitions and Notations 

Let us define a new scale E' = {e'^, . . . , ej.,}, with e\ < ••• < e'j.,, and e' 
corresponds to class Aj. 

We introduce the following particular elements of if, for any class Ai\ 

n 

\Ai:= min min ai 

a^Ai i—1 

A -\ ^ 

Ai \ :=maxmaxai. 

a^Ai i—1 



The interval 



y_Ai, Ai~\ 



is denoted [Ai] for simplicity. Note that this interval 



may reduce to a singleton. In order to avoid cumbersome conditions for some 
definitions, we introduce two (fictitious) additional elements Cq and Ck+i on the 
scale, such that eo < e\ and < e^+i, and the fictitious classes Aq and A^'+i 
(worst and best possible classes), with [^ol = {eo}, and = (efc+ij. 

Considering two intervals [a, h], [c, d] (possibly reduced to singletons) of E, we 
say that [a, b] is to the left (resp. to the right) of [c, d] if 6 < c (resp. a> d). This 
is denoted as [o, b] < [c, d] (resp. [a, b] > [c, d]). Lastly, for any interval I = [a, b], 
we denote by jj[a, &] or |/| the number of elements in interval [a,b], and denote 
the bounds by [/ := a, and I\ := b. This notation is extended to the supports of 



acts a & A (i.e. the interval [min^ Oj, max^ a^]) by [a := 



n 

min Oi, and aJ 

i—1 



n 

max ai . 



Z=1 



By simplicity, the support is denoted [aJ . 

We state more precisely our problem using the following definition. 
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Definition 3. Let f : E — > E' he a surjective non decreasing mapping defining 
a partition on E by /“^(-)- The mapping f is compatible with a compensatory 
operator (resp. weakly compensatory j with respect to ^ if it exists a compen- 
satory operator H : iH" — > E (resp. weakly compensatory), such that f o % 
represents 'fp, i.e. 

Va, b G A,a b f o 'H(a) > f o 'H{b). 

In the sequel, we try to find necessary and sufficient conditions for the existence 
of a mapping compatible with a (weakly) compensatory operator. 

3.2 Main Result 

The following definition is useful [2]. 

Definition 4. The core of Aj, for any j m {1, . . . , k'} is defined as: 

n 

Ka := 0, if min maxoj > max min 

a^Aj 2=1 a^Aj i—1 

n ^ 

Ka := [min max Oi, max min Oi], otherwise. 

a^Aj 2=1 a^Aj i—1 

The core is non empty every time there exist two acts a, 6 in Aj with disjoint 
supports (or coinciding on only one point) for the scores, i.e. such that min^ Oj > 
maxi bi (see figure 1, where the support of three acts a, b, c is figured on a 7- 
degrees scale). In order that f o TL with "H being weakly compensatory can 




Fig. 1. The core of a class: (left) empty core, (center and right) non empty core 



represent the preference, it is necessary that /~^(e') contains Kj. Note that the 
existence of a non empty core is rather an abnormal situation. It means that no 
compensatory operator can represent the preference on E, since for a,b G Aj 
with disjoint supports we have both a ~ & and "H(a) yf "H(6). But thanks to /, 
the evaluation on E' can be the same. This shows the usefulness of /. 

Lemma 1. If the decision profile (A,j=,E) is weakly coherent, then the non 
empty cores (if any) are disjoint, and they are ordered the right way, i.e. Kj>Kji 
whenever j' < j . 
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We introduce another useful interval of [Aj~\ . 
Definition 5. For any class Aj, j = 1, . . . ,k', 

A^j\ :=min min J 

j'>3 ai' 

\_A^j := max max \jF . 
j'<i ai'&Aj, 

Defining the open interval (whenever non empty) 






\_A^j,A-^j\ 



O 

the interior of Aj, denoted by (Aj\, is defined by: 



lAJ^ ■■= [AJ^ n {A,). 



Figure 2 illustrates the definition, with three classes, and a,b,c G Ai, d,e in 
A 2 , and f,g,h G A 3 . Note that and [A<i are properly defined thanks 



class A7 

class A2 

class Aj 




to the additional classes Aq and A^z+i. Indeed, (Ai) = ci, A>iJ and (Aj,/) = 






. Note also that the interior could be empty, even if the decision 



profile is coherent, as shown by the following simple example. 



Example 1: Let us consider n = 3 and 3 acts a, b, c such that c>- b)^ a, 
denoted on a scale with k = 7, defined in the table below. 



act 


criterion 1 


criterion 2 


criterion 3 


a 


64 


66 


64 


b 


64 


67 


64 


c 


61 


65 


65 
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As it can be verified, the decision profile is coherent, but since [A<2 = 64 
and A>2j = 65, we have (A2) = 0 and thus the interior too is empty (see 
figure 3 ). 




Fig. 3. Case of empty interior 



We give some properties of the interior. 

Lemma 2. Let {A,)p,E) he a decision profile. For any class Aj, the following 
properties hold. 

(i) Ayj\ and \_A^j are non decreasing with j, for j = 1, . . . ,k' . 

O O 

(ii) the intervals j = whenever nonempty are such that \_Aj 

o 00 

and Aj~\ are non decreasing with j, where are respectively the left 

O 

and right hounds of [A^] . 

O O 

(Hi) Vj, Vj' < j such that Kjf 0 and [A^] yf 0, Ky and [A^] are disjoint, 
and the latter is to the right of the former (and symmetrically for j' > j ). 

00 00 

(iv) if for some j' < j such that [Aj~\ yf 0, [Aj/~\ yf 0, and [Aj~\ fl [Aj/] yf 0, 

O O 

then no a & A can he such that [aJ C [Aj] fl [Aj/] . 



Lemma 3. Let (A, he a weakly coherent decision profile. For any class Aj, 
the following properties hold. 



(^) tt [A<j,A>jJ 



> 1 . 



(ii) for any € Aj, if (Aj) yf 0, then necessarily [a^] fl [Aj] yf 0, otherwise 
«([«^'Jn [[A<j,A>,]]) =2. 



(Hi) if (Aj) 0, then [Aj] yf 0. 

O O 

(iv) if Kj 0, then [Aj] yf 0, and [Aj] D Kj. 
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As we have seen, the weak coherence of the decision profile is not enough to 
ensure that the interior is non empty, although the above lemmas show that non 
empty interiors have many properties. We introduce the following definition, 
which is closely related to the non-emptiness of interiors. 

Definition 6. (A, E) satisfies the condition of representability by a compen- 
satory (resp. weakly compensatory) operator if there are enough degrees in E 
for representing > by a compensatory (resp. weakly compensatory) operator %, 
i.e., there exists a compensatory (resp. weakly compensatory) operator H such 
that: 

Va € Aj,^a' G Aji,f < j,H{a') < H{a). 

Lemma 4. Let (A, ^,A) be a decision profile satisfying the condition of repre- 
sentability by a weakly compensatory operator. Then 

(i) {A,)^,E) is weakly coherent. 

(a) Vj, j' G {1, . . . , k'} such that j > j' and Kj,Kji yf 0, 

>j-f + i 

and boundary conditions: 

tt Kp\,ek >k'-j' + l 
where j' is the index of the rightmost core, 

>f 

where j' is the index of the leftmost core. 

(Hi) Vj, /, / < j, tl [Ap , Ay] >j-j'-kl. 

(iv) [Aj] yf 0, /or j = 1,... ,fc'. 

(v) If, in addition, (A, )p, E) satisfies the condition of representability by a com- 
pensatory operator, then the profile is coherent. 

Remark 1: Let (A, be a weakly coherent decision profile. If for 

O 

some j, [Ay] = 0, then Lemma 4 (iv) implies that the profile is not 
representable. This is the case of Ex. 1. In fact, from Lemma 3 (iii), the 
condition (Ay) = 0 suffices. 

Remark 2: If there exists a surjective non decreasing mapping / : E — >■ 

E' compatible with a (weakly) compensatory operator TL, then necessar- 
ily (A, )p,E) is representable, precisely by TL. Indeed, let us take a, b such 
that a>- b. Then f oTL{a) > f oTL{b), which implies TL{a) >TL{b). 
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Theorem 1. Let (A,'^,E) be a decision profile. It exists a non decreasing sur- 
jective mapping f defining a partition of E in k' elements, compatible with a 
weakly compensatory operator if and only if the following conditions are satis- 
fied: 

(i) {A,)p,E) is weakly coherent 

(a) Vj, j' G {1, . . . , k'} such that j > f and Kj,Kj> ^ 0, 
and boundary conditions: 

tt Kp\,ek >k'-j' + l 
where j' is the index of the rightmost core, 

^ ei,[Kp >f 

where j' is the index of the leftmost core. 

(Hi) [ij] ^ 0 /or j = 1, . . . , k'. 

(iv) G {1, . . . , k'} such that j > j' , 

tt > j ~ + 1- 

O 

Remark 3: In condition (iii), \_Aj\ can be replaced by {Afi). Also, con- 
dition (iii) is a special case of (iv) if we allow j = j' . 

■ O O 

Remark 4: Taking (iv) with j = k', j' = 1, we get |) [Ai, Afe/] > k' , 
which entails k>k'. 

Remark 5: condition (iv) is sound since due to Lemma 2 (ii), interiors 

r o 0-1 

are ordered, so that [A^/, Aj] is never empty. 

The following can be easily obtained. 

Theorem 2. Let (A, ^) be a decision profile. It exists a non decreasing sur- 
jective mapping f defining a partition of E in k' elements, compatible with a 
compensatory operator if and only if the following conditions are satisfied: 

(i) {A,)p,E) is coherent 

(ii) Vj, j' G {1, . . . , k'} such that j > j' and Kj,Kji 0, 

>/-/+! 

and boundary conditions: 



Kp\,Ck >k'-j' + l 
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where f is the index of the rightmost core, 



ei, 



[Kf >f 



(Hi) 

(iv) 



where j' is the index of the leftmost core. 



7 ^0/or j = 1,... ,k'. 

G {!) • ■ • ; k'} such that j > j' , 



tt 



> J 



.i' + 1 . 



3.3 Example 

Figure 4 illustrates the preceding facts. We consider 8 acts a,b,c,d,e, f,g,h, 3 
criteria and a scale E with 9 degrees. The decision profile is defined as follows. 



act 


criterion 1 


criterion 2 


criterion 3 


class 


a 


ei 


62 


62 


Ai 


b 


64 


63 


63 


A\ 


c 


ei 


6l 


66 


A2 


d 


63 


65 


64 


As 


e 


64 


64 


66 


As 


f 


68 


67 


67 


As 


9 


69 


68 


67 


A4 


h 


65 


69 


67 


As 



Classes are such that Ai ^ A2 ^ ^ A^. There are two cores Ki and K3. 

It can be verified that the profile is coherent, {E, A) is compatible, k' < k, 
and conditions (iii) and (iv) of the theorem are satisfied. Hence there exists a 
partition of E compatible with a compensatory operator, in the sense of theorem 
2 (unique in this case). 



4 Concluding Remarks 

Our results tell if a given decision profile can be represented on a given scale 
E, by a compensatory or a weakly operator. We see that the necessary and 
sufficient conditions for doing this are of two kinds. The first kind involves only 
the inherent properties of the decision profile, regardless of E, while the second 
kind of conditions focuses on E only. 

If the first kind of condition is not satisfied, then there is no hope for repre- 
senting the decision profile by a (weakly) compensatory operator. By contrast, 
if some conditions of the second kind are not satisfied, this is only a matter of 
poorness of the scale, i.e. there is not enough degrees to represent properly the 
global preference. Thus, a corollary of our theorem could be to exhibit a new 
scale E” containing E (i.e. a refinement of the original scale), which is just suf- 
ficient to fulfill all conditions of the second kind. This is of primary importance 
on a practical point of view. This problem is addressed in the long version of 
this paper. 
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Fig. 4. An example of preference representation 
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Abstract. Solving multicriteria decision problems, like choice and ranking, 
requires the use of DM’s preference model. In this paper we advocate for the 
preference model in terms of then...” decision rules induced from decision 
examples provided by the DM. This model has two advantages over the 
classical models : (i) it is intelligible and speaks the language of the DM, (ii) the 
preference information comes from observation of DM’s decisions. For a finite 
set A of actions evaluated by a family of criteria, we consider the preference 
information given in the form of pairwise comparisons of reference actions 
presented in a pairwise comparison table (PCT). In PCT, pairs of actions from a 
subset BAA are described by preference relations on particular criteria and 
by a comprehensive outranking relation. Using the dominance-based rough set 
approach to the analysis of the PCT, we obtain a rough approximation of the 
outranking relation by a dominance relation. Then, a set of then...” 

decision rules is induced from these approximations. The decision rules 
constitute the preference model which is then applied to a set M A of (new) 
actions. As a result, we obtain a four-valued outranking relation on set M. In 
order to obtain a final recommendation in the problem of choice or ranking, the 
four-valued outranking relation on set M is exploited using a net flow score 
procedure. 



Keywords. Decision rules. Multicriteria decision analysis. Rough sets. Choice, 
Ranking. 



1 Introduction 

Construction of a logical model of behavior from observation of individual’s acts is a 
paradigm of artificial Intelligence and, in particular, of inductive learning. Solving 
multicriteria decision problems, such as choice and ranking, requires the use of DM’s 
(Decision Maker’s) preference model. It is usually a (utility) function or a binary 
relation - its construction requires some preference information from the DM, like 
substitution ratios among criteria, importance weights, or indifference, preference and 
veto thresholds. Acquisition of this preference information from the DM is not easy 
and, moreover, the resulting preference model is not intelligible for the DM. In this 
situation, the preference model in terms of “if..., then...” decision rules induced from 
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decision examples provided by the DM has two advantages over the classical models : 
(i) it is intelligible and speaks the language of the DM, (ii) the preference information 
comes from observation of DM’s decisions. The rule-based preference model and its 
construction are concordant with the above paradigm of artificial intelligence. 

There is, however, a problem with inconsistency often present in the set of 
decision examples. These inconsistencies cannot be considered as simple error or 
noise - they follow from hesitation of the DM, unstable character of his/her 
preferences and incomplete determination of the family of criteria. They can convey 
important information that should be taken into account in the construction of the 
DM’s preference model. Rather to correct or ignore these inconsistencies, we 
propose to take them into account in the preference model construction using the 
rough set concept [19,20,26]. For this purpose, we have extended the original rough 
sets theory in two ways : (i) substituting the classical indiscernibility relation by a 
dominance relation, which permits taking into account the preference order in 
domains (scales) of criteria (attributes), and (ii), substituting the data table by a 
pairwise comparison table, where each row corresponds to a pair of actions described 
by binary relations on particular criteria, which permits approximation of a 
comprehensive preference relation in multicriteria choice and ranking problems. The 
extended rough set approach is called dominance-based rough set approach 
[2,3,4,8,9,10]. 

Given a finite set A of actions evaluated by a family of criteria, we consider the 
preferential information in the form of a pairwise comparison table (PCT) including 
pairs of reference actions from a subset BAA described by preference relations on 
particular criteria and a comprehensive outranking relation. Using the rough set 
approach to the analysis of the PCT, we obtain a rough approximation of the 
outranking relation by a dominance relation. The dominance-based rough set 
approach answers several questions related to the approximation: (a) is the set of 
decision examples consistent? (b) what are the non-redundant subsets of criteria 
ensuring the same quality of approximation as the whole set of criteria? (c) what are 
the criteria which cannot be eliminated from the approximation without decreasing 
the quality of approximation? (d) what minimal “if..., then ...” decision rules can be 
induced from the approximations? The resulting decision rules constitute a preference 
model. It is more general than the classical utility function or any relation. 

Decision rules derived from rough approximations are then applied to a set M A 
of (new) actions. As a result, we obtain a four-valued outranking relation on set M. 
The definition of a suitable exploitation procedure in order to obtain a 
recommendation was an open problem. We proposed an exploitation procedure for 
multicriteria choice and ranking that is based on the net flow score and satisfies some 
desirable properties. 

The paper is organized as follows. In section 2, we present the extension of the 
rough set concept on multicriteria decision problems, with particular emphasis on 
multicriteria choice and ranking problems. It is based on analysis of pairwise 
comparisons of actions, so the rough set approach concerns in this case approximation 
of a preference binary relation by specific dominance relations. These dominance 
relations can be multigraded, when the criteria have quantitative or numerical non- 
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quantitative scales, or without degree of preference, when the criteria have ordinal 
scales. Section 3 presents an illustrative example and section 4 includes conclusions. 



2 Extension of the Rough Set Concept on Multicriteria Decision 
Problem s 

In any multicriteria and/or multiattribute decision problem, no recommendation can 
be elaborated before the DM provides some preferential information suitable to the 
preference model assumed [16,23]. There are two major models used until now in 
multicriteria decision analysis: functional and relational. The functional model has 
been extensively used within the framework of multiattribute utility theory [15]. The 
relational model has its most widely known representation in the form of an 
outranking relation [24] and a fuzzy relation [22]. These models require specific 
preferential information more or less explicitly related with their parameters. For 
example, in the deterministic case, the DM is often asked for pairwise comparisons of 
actions, from which one can assess the substitution rates in the functional model or 
importance weights in the relational model. This kind of preferential information 
seems to be close to the natural reasoning of the DM. He/she is typically more 
confident exercising his/her comparisons than explaining them. The representation of 
this information by functional or relational models seems, however, less natural. 
According to Slovic [25], people make decisions by searching for rules that provide 
good justification of their choices. So, after getting the preferential information in 
terms of exemplary comparisons, it is natural to build the preference model in terms 
of then..." rules. Then, these rules can be applied to a set of potential actions in 
order to obtain specific preference relations. From the exploitation of these relations, 
a suitable recommendation can be obtained to support the DM in decision problem at 
hand. 

The induction of rules from examples is a typical approach of artificial 
intelligence. It is concordant with the principle of posterior rationality by March [18] 
and with aggregation-disaggregation logic by Jacquet-Lagreze [14]. The rules explain 
the preferential attitude of the DM and enable his/her understanding of the reasons for 
his/her preference. As pointed out by Langley and Simon [17], the recognition of the 
rules by the DM justifies their use for decision support. The preference model in the 
form of rules derived from examples fulfils both explanation and recommendation 
tasks. 

In this section we are presenting the main extension of the rough set approach, 
resulting in a new methodology of modeling DM’s preferences in terms of decision 
rules. The rules are induced from the preferential information given by the DM in the 
form of examples of decisions. 

More precisely, for A being a finite set of actions (real or fictitious, potential or 
not) considered in a multicriteria problem, the examples of decisions are confined to a 
subset of actions B A, relatively well known to the DM, called reference actions. 
Depending on the type of the multicriteria problem, the examples concern either 
assignment of reference actions to decision classes (sorting problem) or pairwise 
comparisons of reference actions (choice and ranking problems). As to relation of 
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these examples to reality, they can be either historical or simulated. Historical 
examples represent actual decisions made by the DM in the past. Simulated examples 
represent decisions declared by the DM with respect to reference actions, but not 
really made. Selection of reference actions and exemplary decisions is a crucial issue 
for obtaining a robust preference model. The authors are currently working on this 
issue which is also linked to the problem of defining a proper interactive procedure 
for induction and acceptance of decision rules. 

For algorithmic reasons, information about reference actions is represented in the 
form of a data table. The rows of the table are labeled by reference actions, whereas 
columns are labeled by attributes and entries of the table are attribute -values, called 
descriptors. Formally, by a data table we understand the 4-tuple S=<B,Q,V,f>, where 
B is a finite set of reference actions, Q is a finite set of attributes, V = |J Vq and Vq 

q Q 

is a domain of the attribute q, and f:B Q V is a total function such that f(x,q) 

for every q Q, x B, called an information function. A data table S can be seen as 
decision table if in the set of attributes Q there can be distinguished two disjoint sets: 
set C of condition attributes and set D of decision attributes. 

As was already mentioned, the notion of attribute differs from that of criterion 
because the domain (scale) of a criterion has to be ordered according to a decreasing 
or increasing preference, while the domain of the attribute does not have to be 
ordered. Formally, for each q C being a criterion there exists an outranking relation 
[23] on the set of actions A such that xS^y means “x is at least as good as y with 
respect to criterion q”. We suppose that is a total preorder, i.e., a strongly complete 
and transitive binary relation defined on A on the basis of evaluations f( ,q). If domain 
for criterion q is quantitative and for each x,y A, f(x,q) f(y,q) implies xS^y, then 
is a scale of preference of criterion q. If, however, for criterion q, is not 
quantitative and/or f(x,q) f(y,q) does not imply xS^y, then in order to define a scale of 
preference for criterion q, one can choose a function g^:A R such that for each 
x,y A, xS^y if and only if g^(x) g^Cy); to this aim it is enough to order the actions of 
A from the worst to the best on criterion q and to assign to gg(x) consecutive numbers 
corresponding to the rank of x in this order, i.e., for z being the worst, g^(z)=l, for w 
being the second worst, g^(w)=2, and so on. Then, the domain of function g^( ) 
becomes a scale of preference for criterion q and the domain is recoded such that 
f(x,q)=g^(x) for every x A. 

Several attempts have already been made to use the rough sets theory to decision 
support [21,27]. The original rough set approach is not able, however, to deal with 
preference-ordered attribute domains and decision classes. Solving this problem was 
crucial for application of the rough set approach to multicriteria decision analysis 
(MCDA). 

Let us explain shortly why the original rough set approach is not able to deal with 
inconsistencies coming from consideration of criteria, i.e. attributes with preference- 
ordered domains (scales), like product quality, market share, debt ratio. Consider, for 
example, two firms, A and B, evaluated for assessment of bankruptcy risk by a set of 
criteria including the “debt ratio” (total debt/total assets). If firm A has a low value 
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while firm B has a high value of the debt ratio, and evaluations of these firms on 
other attributes are equal, then, from bankruptcy risk point of view, firm A dominates 
firm B. Suppose, however, that firm A has been assigned by a DM to a class of higher 
risk than firm B. This is obviously inconsistent with the dominance principle. Within 
the original rough set approach, the two firms will be considered as just discernible 
and no inconsistency will be stated. 

In the case of multicriteria choice and ranking problems the decision table in its 
original form does not allow the representation of preference binary relations 
between actions. To handle binary relations within the rough set approach, Greco, 
Matarazzo and Slowinski [2] proposed to operate on, so called, pairwise comparison 
table (PCT), i.e., with respect to a choice or ranking problem, a decision table whose 
rows represent pairs of actions for which multicriteria evaluations and a 
comprehensive preference relation are known. 

Using an indiscernibility relation on the PCT one cannot exploit properly the ordinal 
information present in the data. Indiscernibility permits handling inconsistency which 
occures when two pairs of actions have preferences of the same strength on considered 
criteria, however, the comprehensive preference relations established for these pairs are 
not the same. When dealing with criteria, there may arise also another type of 
inconsistency connected with the dominance principle: on a given set of criteria, one 
pair of actions is characterized by some preferences and another pair has all preferences 
of at least the same strength, however, for the first pair we have a comprehensive 
preference and for the other - an inverse comprehensive preference. This is why the 
indiscernibility relation is not able to handle all kinds of inconsistencies connected with 
the use of criteria. For this reason, another way of defining rough approximations and 
decision rules has been proposed, based on the use of graded dominance relations. 

2.1 The Pairwise Comparison Table 

Let C be the set of criteria used for evaluation of actions from A. For any criterion 
q C, let Tq be a finite set of binary relations defined on A on the basis of the 
evaluations of actions from A with respect to the considered criterion q, such that for 
every (x,y) A A exactly one binary relation t T^^ is verified. More precisely, given 
the domain of q C, if v'q,v"q are the respective evaluations of x,y A by 
means of q and (x,y) t, with t T^, then for each w,z A having the same evaluations 
v'q,v"^^ by means of q, (w,z) t. Furthermore, let T^ be a set of binary relations defined 
on set A (comprehensive pairwise comparisons) such that at most one binary 
relation t Tj is verified for every (x,y) A A. 

The preferential information has the form of pairwise comparisons of reference 
actions from B A, considered as exemplary decisions. The pairwise comparison 
table (PCT) is defined as data table = B,C {d),T^ T^,g , where B B B is a 

non-empty set of exemplary pairwise comparisons of reference actions, T(^= IJ Tq , d 

q C 

is a decision corresponding to the comprehensive pairwise comparison 
(comprehensive preference relation), and g:B (C {d}) T^ Tj is a total function 
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such that g[(x,y),q] for every (x,y) A A and for each q C, and g[(x,y),d] Tj 
for every (x,y) B. It follows that for any pair of reference actions (x,y) B there is 
verified one and only one binary relation t T^. Thus, induces a partition of B. In 
fact, data table can be seen as decision table, since the set of considered criteria 
C and decision d are distinguished. 

We assume that the exemplary pairwise comparisons made by the DM can be 
represented in terms of graded preference relations (for example "very weak 
preference", "weak preference", "strict preference", "strong preference", "very strong 
preference") pj| : for each q C and for every (x,y) A A, 

T ={P^h H}, 

where is a particular subset of the relative integers and 

xpjjy, h>0, means that action x is preferred to action y by degree h with respect to 
the criterion q, 

xpjjy, h<0, means that action x is not preferred to action y by degree h with respect 
to the criterion q, 

x Pq y means that action x is similar (asymmetrically indifferent) to action y with 
respect to the criterion q. 

Within the preference context, the similarity relation Pq , even if not symmetric, 
resembles indifference relation. Thus, in this case, we call this similarity relation 
"asymmetric indifference". Of course, for each q C and for every (x,y) A A, 
[xphy, h 0] [yp^x, k 0]. 

The set of binary relations T^ may be defined in a similar way, but x p|j y means 
that action x is comprehensively preferred to action y by degree h. 

Technically, the modeling of the binary relation pj| , i.e. the assessment of h, can be 
organized as follows: 

• first, it is observed that for any q C there exists a function c^: A R which is 
increasing with respect to the preferences on q (the evaluations of c^ depend on the 
evaluations of the total function f(x,q), more precisely f(x,q) = f(y,q) implies 
c,(x) = c^(y)), 

• then, it is possible to define a function k^: R which measures the strength of 

the preference (positive or negative) of x over y (e.g. kq[Cq(x),Cq(y)]=Cq(x)-Cq(y)); 

it should satisfy the following properties for all x,y,z A: 



Cq(x) > Cq(y) 


kq[Cq(x),Cq(z)] > kJCq(y),Cq(z)], 


(i) 


Cq(x) > Cq(y) 


kq[Cq(z),Cq(x)] < kJCq(z),Cg(y)], 


(ii) 


Cq(x) = Cq(y) 


kq[Cq(x),Cq(y)] = 0, 


(iii) 
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• next, the domain of can be divided into intervals, using a suitable set of 
thresholds A^, for each q C; these intervals are numbered such that positive 
strength intervals get numbers 1, 2, 3, ..., while negative strength intervals, -1, -2, 
-3, ..., starting symmetrically from interval no. 0 that includes k^[Cq(x),Cq(y)]=0, 

• the value of h in the relation xpjjy is then equal to the number of interval that 
includes k^[Cq(x),c^^(y)], for any (x,y) A A. 

Actually, property (iii) can be relaxed in order to obtain a more general preference 
model which, for instance, does not satisfy preferential independence [15]. 

We are considering a PCX where the set Tj is composed of two binary relations 
defined on A: 

X outranks y (denoted by xSy or (x,y) S), where (x,y) B, 

X does not outrank y (denoted by xS'y or (x,y) S‘), where (x,y) B, 

and S S‘=B, where "x outranks y " means "x is at least as good as y" [23]; observe 
that the binary relation S is reflexive, but neither necessarily transitive nor complete. 

2.2 Multigraded Dominance 

The graded dominance relation introduced in [2] assumes a common grade of 
preference for all the considered criteria. While this permits a simple calculation of 
the approximations and of the resulting decision rules, it is lacking in precision. A 
dominance relation allowing a different degree of preference for each considered 
criterion (multigraded dominance) gives a far more accurate picture of the preferential 
information contained in the pairwise comparison table Sp^.^. 

More formally, given P C (P ), (x,y),(w,z) A A, the pair of actions (x,y) is 
said to dominate (w,z), taking into account the criteria from P (denoted by 
(x,y)Dp(w,z)), if X is preferred to y at least as strongly as w is preferred to z with 
respect to each q P. Precisely, "at least as strongly as" means "by at least the same 
degree", i.e. hq kq, where hq,kq H^, xp^^i y and w Pq*^ z, for each q P. 

Let D|^| be the dominance relation confined to the single criterion q P. The binary 
relation D|^| is reflexive ((x,y)Dj^,(x,y), for every (x,y) A A), transitive 
((x,y)D,„(w,z) and (w,z)D,^,(u,v) imply (x,y)D,^,(u,v), for every (x,y),(w,z), 
(u,v) A A), and complete ((x,y)Dj^,(w,z) and/or (w,z)D|^i(x,y), for all (x,y), 
(w,z) A A). Therefore, Dj^, is a complete preorder on A A. Since the intersection of 
complete preorders is a partial preorder and Dp= H^iq) ’ ^ dominance 

q P 

relation Dp is a partial preorder on A A. 

Let RPC and (x,y),(u,v) A A; then the following implication holds: 



(x,y)Dp(u,v) (x,y)Dp(u,v). 
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Given P C and (x,y) A A, we define: 

• a set of pairs of actions dominating (x,y), called V- dominating set, Dp(x,y) = 
{(w,z) A A: (w,z)Dp(x,y)}, 

• a set of pairs of actions dominated by (x,y), called P-dominated set, Dp(x,y) = 
{(w,z) A A: (x,y)Dp(w,z)}. 

The P-dominating sets and the P-dominated sets defined on B for all pairs of 
reference actions from B are “granules of knowledge” that can be used to express P- 
lower and P-upper approximations of comprehensive outranking relations S and S‘, 
respectively: 

P(S)={(x,y) B: D^(x,y) S}, 

P(S)= UDp(x,y). 

(x,y) S 

P(S‘)={(x,y) B: D;(x,y) S^}, 

P(S‘)= UD;(x,y). 

(x,y) S' 

It has been proved in [2] that 

P(S) S P(S), P(S“^) S' P(S'). 

Furthermore, the following complementarity properties hold: 

P(S) = B-P(S'), P(S) = 5-P(S'), 

P(S') = 5-P(S), P(S') = B-P(S). 

The P-houndaries (P-doubtful regions) of S and S' are defined as 
Bnp(S)=P (S)- P (S), Bnp(S')=P (S') -P (S'). 

From the above it follows that Bnp(S)=Bnp(S'). 

The concepts of the quality of approximation, reducts and core can he extended 
also to the approximation of the outranking relation hy multigraded dominance 
relations. In particular, the coefficient 

card(p(S) P(S")) 

Yp — 

card(B) 

defines the quality of approximation of S and S' by P C. It expresses the ratio of all 
pairs of actions (x,y) B correctly assigned to S and S' hy the set P of criteria to all the 
pairs of actions contained in B. Each minimal subset P C, such that Yp = Yc ’ called 
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a reduct of C (denoted by ). Let us remark that Sp^,^ can have more than one 

reduct. The intersection of all B-reducts is called the core (denoted by CORE§^^ ). 

Using the approximations defined above, it is then possible to induce a generalized 
description of the preferential information contained in a given in terms of 
suitable decision rules. The syntax of these rules is based on the concept of upward 
cumulated preferences (denoted by Pq*’) and downward cumulated preferences 

(denoted by Pq'’ ), having the following interpretation: 

• xPq**y means "x is preferred to y with respect to q hy at least degree h", 

• xPq**y means "x is preferred to y with respect to q by at most degree h". 

Exact definition of the cumulated preferences, for each (x,y) A A, q C and 
h H^, is the following: 

• xPq**y ifxPqy, where k and k h, 

• xPq^'y ifxPqy, where k and k h. 

Using the above concepts, three types of decision rules can he considered: 

1) D -decision rules with the following syntax: 

if xPqJ'*^‘''^y and xPq|*‘'^*y and ... xPq^^^iP^y, then xSy, 

where P={ql,q2,...,qp } C and (h(ql),h(q2),...,h(qp)) H^j these rules 

are supported by pairs of actions from the P-lower approximation of S only; 

2) D -decision rules with the following syntax: 

if xPq}*Ui)y xPqj^^i^^y and ... xEq^^^'y, then xS‘y, 

where P={ql,q2,...,qp } C and (h(ql),h(q2),...,h(qp)) ... H^; these rules 

are supported by pairs of actions from the P-lower approximation of S‘ only; 

3) D -decision rules with the following syntax: 

if xPqJ'^'i'^y and xPq|<‘'^)y and ... xPq^^^i'^^y and xPq^|'|'^^ '^y and ... xPq^^^^y, 
then xSy or xS‘y, 

where 0’={ql,q2,...,qk } C, 0”={qk-Hl,qk-h2,...,qp ) C, P=0’ O”, O’ and O” 
not necessarily disjoint, (h(ql),h(q2),...,h(qp)) these rules are 

supported hy actions from the P-houndary of S and S° only. 
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2.3 Dominance without Degrees of Preference 

The degree of graded preference considered above is defined on a quantitative scale 
of the strength of preference k^, q C. However, in many real world problems, the 
existence of such a quantitative scale is rather questionable. Roy [23] distinguishes 
the following cases: 

• preferences expressed on an ordinal scale: this is the case where the difference 
between two evaluations has no clear meaning; 

• preferences expressed on a quantitative scale: this is the case where the scale is 
defined with reference to a unit clearly identified, such that it is meaningful to 
consider an origin (zero) of the scale and ratios between evaluations (ratio scale); 

• preferences expressed on a numerical non-quantitative scale: this is an 
intermediate case between the previous two; there are two well-known particular 
cases: 

□ interval scale, where it is meaningful to compare ratios between differences 
of pairs of evaluations, 

□ scale for which a complete preorder can be defined on all possible pairs 
of evaluations. 

The strength of preference kq and, therefore, the graded preference considered in 
point 2.1, is meaningful when the scale is quantitative or numerical non-quantitative. 
If the information about kq is non-available, then it is possible to define a rough 
approximation of S and using a specific dominance relation between pairs of 
actions from A A. This dominance relation is defined directly on an ordinal scale 
represented by evaluations Cq(x) on criterion q, for all actions x A [4]. Let us explain 
this latter case in more detail. 

Let C° be the set of criteria expressing preferences on an ordinal scale, and C'^, the 
set of criteria expressing preferences on a quantitative scale or a numerical non- 
quantitative scale, such that C° C”=C and C° C”= . Moreover, for each P C, we 
denote by P° the subset of P composed of criteria expressing preferences on an ordinal 
scale, i.e. P°=P C°, and P” the subset of P composed of criteria expressing 
preferences on a quantitative scale or a numerical non-quantitative scale, i.e. 
P^^=P C'^. Of course, for each P C, we have P=P^^ P° and P° P”= . 

If P=P^^ and P°= , then the definition of dominance is the same as in the case of 
multigraded dominance (point 2.2). If P=P° and P^^= , then, given (x,y),(w,z) A A, 
the pair (x,y) is said to dominate the pair (w,z) with respect to P if, for each q P, 
c^(x) c^(w) and c^(z) c^(y). Let D|^| be the dominance relation confined to the single 
criterion q P°. The binary relation D|^| is reflexive ((x,y)Dj^,(x,y), for every 
(x,y) A A), transitive ((x,y)Dj^,(w,z) and (w,z)D|^i(u,v) imply (x,y)D|^i(u,v), for all 
(x,y),(w,z),(u,v) A A), but non-complete (it is possible that not (x,y)D|^i(w,z) and 
not (w,z)D|^i(x,y) for some (x,y),(w,z) A A). Therefore, is a partial preorder. 
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Since the intersection of partial preorders is also a partial preorder and Dp= H^iq) ’ 

q P 



P=P°, then the dominance relation Dp is also a partial preorder. 



If some criteria from P C express preferences on a quantitative or a numerical 
non-quantitative scale and others on an ordinal scale, i.e. if P'^ and P° , then, 
given (x,y),(w,z) A A, the pair (x,y) is said to dominate the pair (w,z) with respect 
to criteria from P, if (x,y) dominates (w,z) with respect to both P” and P°. Since the 
dominance relation with respect to P” is a partial preorder on A A (because it is a 
multigraded dominance) and the dominance with respect to P° is also a partial 
preorder on A A (as explained above), then also the dominance Dp, being the 
intersection of these two dominance relations, is a partial preorder. In consequence, 
all the concepts introduced in the previous points can be restored using this specific 
definition of dominance relation. 



Using the approximations of S and S° based on the dominance relation defined 
above, it is possible to induce a generalized description of the available preferential 
information in terms of decision rules. These decision rules are of the same type as 
the rules already introduced in the previous point; however, the conditions on criteria 
from C° are expressed directly in terms of evaluations belonging to domains of these 
criteria. 



Let Cq={c^(x), X A}, q C°. The decision rules have then the following syntax: 

1) D -decision rules: 

if y and ... y and c^,Jx) r^^^^ and c^,Jy) and ... c^(x) r^ 

and c (y) s , then xSy, 

where P={ql,...,qp} C, P”={ql,...,qe}, P°={qe-i-l,...,qp}, (h(ql),...,h(qe)) ... 

Hqe and (r,,pi,...,r^), (s,„j,...,s^) ... C^; these rules are supported by pairs of 

actions from the P-lower approximation of S only; 

2) D -decision rules: 

j^xp^h(qi)y and ... xPqhfw^y and c^^fx) and c^^fy) and ... c^(x) r^ 
and c (y) s , then xS‘y, 

where P={ql,...,qp} C, P”={ql,...,qe}, P°={qe-i-l,...,qp}, (h(ql),...,h(qe)) ... 

Hqe and (r,,,i,...,rj, (s,„j,...,s^) ... these rules are supported by pairs of 

actions from the P-lower approximation of S‘ only; 

3) D -decision rules: 

If ^p^h(qi)y and ... xPqh(qe)y ^ p^Mqe+D y ^p^h(qf)y c^j^j(x) r^^^j and 

c^miy) \uiand ...c^fx) x and cjy) s^^and x and s^^^^and... 

Cqp(x) X^ and c^{y) s^, 

then xSy or xS y. 
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where 0’={ql,...,qe} C, 0”={qe+l,...,qf) } C, P'^=0’ O”, O’ and O” not 
necessarily disjoint, P°={qf+l,...,qp), (h(ql h(qf)) ... (r,f^i,...,r,^), 

(s^f^j,...,s^) ... C^; these rules are supported hy pairs of actions from the P- 

boundary of S and S° only. 

Procedures for rule induction from rough approximations have been proposed in 
[10,28,29]. 

2.4 Exploitation Procedure 

The decision rules, induced from a given Sp„ describe the comprehensive preference 
relations S and S‘ either exactly (D - and D -decision rules) or approximately (D - 
decision rules). A set of these rules covering all pairs of Sp^ represent a preference 
model of the DM who gave the pairwise comparison of reference actions. Application 
of these decision rules on a new subset M A of actions induces a specific preference 
structure on M. 

In fact, any pair of actions (u,v) M M can match the decision rules in one of four 
ways: 

- at least one D -decision rule and neither D - nor D -decision rules, 

- at least one D -decision rule and neither D - nor D -decision rules, 

- at least one D -decision rule and at least one D -decision rule, or at least one 
D -decision rule, or at least one D -decision rule and at least one D - and/or at 
least one D -decision rule, 

- no decision rule. 

These four ways correspond to the following four situations of outranking, 
respectively: 

- uSv and not uS‘v, that is true outranking (denoted by uS’^v), 

- uS‘v and not uSv, that is /a/ie outranking (denoted by uS'^v), 

- uSv and uS'v, that is contradictory outranking (denoted by uS*^v), 

- not uSv and not uS'v, that is unknown outranking (denoted by uS\). 

The four above situations, which together constitute the so-called four-valued 
outranking [30], have been introduced to underline the presence and absence of 
positive and negative reasons for the outranking. Moreover, they make it possible to 
distinguish contradictory situations from unknown ones. 

A final recommendation (choice or ranking) can be obtained upon a suitable 
exploitation of this structure, i.e. of the presence and the absence of outranking S and 
S‘ on M. A possible exploitation procedure consists in calculating a specific score, 
called Net Flow Score, for each action x M: 

S„,(x) = S"\x) - S^-(x) + S'\x) - S"(x), 



where 
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S**(x) = card({y M: there is at least one decision rule which affirms xSy}), 
S^'(x) = card({y M: there is at least one decision rule which affirms ySx}), 
S'^(x) = card({y M: there is at least one decision rule which affirms yS‘x}), 

S "(x) = card({y M: there is at least one decision rule which affirms xS‘y})- 

The recommendation in ranking problems consists of the total preorder determined 
by S_^f(x) on M; in choice problems, it consists of the action(s) x* M such that 
S„f(x*)=max{S„f(x)}. 

X M 

The procedure described above has been recently characterized with reference to a 
number of desirable properties [13]. 



3 An Example 

Let us suppose that a company managing a chain of warehouses wants to buy some 
new warehouses. To choose the best proposals or to rank them all, the managers of 
the company decide to analyze first the characteristics of eight warehouses already 
owned by the company (reference actions). This analysis should give some 
indications for the choice and ranking of the new proposals. Eight warehouses 
belonging to the company have been evaluated by three following criteria: capacity of 
the sales staff (Aj), perceived quality of goods (A^) and high traffic location (A^). The 
domains (scales) of these attributes are presently composed of three preference- 
ordered echelons: Vj=V 2 =V 3 ={ sufficient, medium, good}. The decision attribute (d) 
indicates the profitability of warehouses, expressed by the Return On Equity (ROE) 
ratio (in %). Table 1 presents a decision table with the considered reference actions. 



Table 1. Decision table with reference actions 



Warehouse 


A. 


A, 


A 3 


d (ROE %) 


1 


good 


medium 


good 


10.35 


2 


good 


sufficient 


good 


4.58 


3 


medium 


medium 


good 


5.15 


4 


sufficient 


medium 


medium 


-5 


5 


sufficient 


medium 


medium 


2.42 


6 


sufficient 


sufficient 


good 


2.98 


7 


good 


medium 


good 


15 


8 


good 


sufficient 


good 


-1.55 



With respect to the set of criteria C=C”={Aj,A 2 ,A 3 ) the following multigraded 
preference relations ?■' , i=l,2,3, were defined: 
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- xp°y (and yP°x), meaning that x is indifferent to y with respect to A, if 
f(x,A>f(y,A^). 

- X Pj y (and y Pj'' x), meaning that x is weakly preferred to y with respect to A^, if 
f(x,Aj)=good and f(y,Aj)=medium, or if f(x,Aj)=medium and f(y,Aj)=sufficient, 

- X p? y (and y p^^ x), meaning that x is preferred to y with respect to A, if 
f(x,A)=good and f(y,A)=sufficient, 

- X P? y (and y pr^ x), meaning that x is strongly preferred to y with respect to A, if 
f(x,A)=good and f(y,A)=medium, or if f(x,A|)=medium and f(y,A)=sufficient. 

Using the decision attribute, the comprehensive outranking relation was build as 
follows: warehouse x is at least as good as warehouse y with respect to profitability 
(xSy) if 

ROE(x) ROE(y) - 2%. 

Otherwise, i.e. if ROE(x) < ROE(y) - 2%, warehouse x is not at least as good as 
warehouse y with respect to profitability (xS‘y). 

The pairwise comparisons of reference actions result in a PCX. The rough set 
analysis of the PCX leads to conclusion that the set of decision examples on reference 
actions is inconsistent. The quality of approximation of S and S° by all criteria from 
set C is equal to 0.44. Moreover, REDo =COREi; ={A,,A„AJ; this means that 

no criterion is superfluous. 

The C-lower approximations and the C-upper approximations of S and S‘, obtained 
by means of multigraded dominance relations, are as follows: 

C(S) = {(1,2),(1,4),(1,5),(1,6),(1,8),(3,2),(3,4),(3,5),(3,6),(3,8),(7,2),(7,4),(7,5), 
(7,6),(7,8)}, 

C (S‘) = { (2,1),(2,7),(4,1),(4,3),(4,7),(5, 1),(5,3),(5,7),(6,1),(6,3),(6,7),(8, 1),(8,7) } . 

All the remaining 36 pairs of reference actions belong to the C-boundaries of S an 
S^ i.e. BUc(S) = Bne(S‘). 

The following minimal D -decision rules and D -decision rules can be induced from 
lower approximations of S and respectively (within parentheses there are the pairs 
of actions supporting the corresponding rules): 

*/x Pi ' y and x Pj' y, then xSy; ((1,6), (3, 6 ), (7, 6 )), 

ifx Pj' y and x P 3 ° y, then xSy; ((1,2),(1,6),(1,8),(3,2),(3,6),(3,8),(7,2),(7,6),(7,8)), 

ifx P 2 ° y and x P 3 ' y, then xSy; ((1,4),(1,5),(3,4),(3,5),(7,4),(7,5)), 

if xpf^y andxpf^y, thenxS°y\ ((6,1), (6, 3), (6, 7)), 

ifx P 2 ° y and x p^ ' y, then xS y; ((4,1),(4,3),(4,7),(5,1),(5,3),(5,7)), 
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ifx Pi °yandx P 2 ^yandx P 3 O y, then xS y; ((2,1), (2, 7), (6,1), ( 6 , 3), ( 6 , 7), (8,1), ( 8 , 7)). 

Moreover, it was possible to induce five minimal D -decision rules from the 
boundary of approximation of S and S': 

ifxPi^y and xP 2 °y and xP 3 °y and xPj^y, then xSy or xS'y; ((1,1), (1,3), (1,7), 
(2,2),(2,6),(2,8),(3,1),(3,3),(3,7),(4,4),(4,5),(5,4),(5,5),(6,2),(6,6),(6,8),(7,1),(7,3), 
(7,7),(8,2),(8,6),(8,8)}, 

ifx Pj^' y and x Pj' y, then xSy or xS'y; ((2, 4), (2, 5), (6,4), (6,5), ( 8 , 4), ( 8 , 5)), 

ifx P 2 ' y and x Pj^' y, then xSy or xS'y; ((4, 2), (4, 6 ), (4,8), (5, 2), (5, 6 ), (5, 8 )), 

ifxpfyandxpfyand xPj^y, then xSy or xS'y; ((1,3), (2,3), (2, 6 ), (7, 3), ( 8 , 3), ( 8 , 6 )), 

ifx Pi‘ y and x P 2 ' y, then xSy or xS'y; ((2,3),(2,4),(2,5),(8,3),(8,4),(8,5)}, 

Using all above decision rules and the Net Flow Score exploitation procedure on 
ten other warehouses proposed for sale, the managers obtained the result presented in 
Table 2. The dominance-based rough set approach gives a clear recommendation: 

• for the choice problem it suggests to select warehouse 2 ’ and 6 ’, having 
maximum score (9), 

• for the ranking problem it suggests the ranking presented in the last column of 
Table 2: 

(2’,6’) (S’) (9’) (n (4’) (S’) (3”) (7’,10’) 



Table 2. Ranking of warehouses for sale by decision rules and the Net Flow Score exploitation 
procedure 



Warehouse for 
sale 


A, 


A. 


A3 


Net Flow 
Score 


Ranking 


1 ’ 


good 


sufficient 


medium 


1 


5 


2 ’ 


sufficient 


good 


good 


11 


1 


3’ 


sufficient 


medium 


sufficient 


-8 


8 


4’ 


sufficient 


good 


sufficient 


0 


6 


5’ 


sufficient 


sufficient 


medium 


-4 


7 


6 ’ 


sufficient 


good 


good 


11 


1 


T 


medium 


sufficient 


sufficient 


-11 


9 


8 ’ 


medium 


medium 


medium 


7 


3 


9’ 


medium 


good 


sufficient 


4 


4 


10 ’ 


medium 


sufficient 


sufficient 


-11 


9 
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4 Conclusions 

In this paper, we made a synthesis of the contribution of the extended rough sets 
theory to multicriteria choice and ranking problems. Classical use of the rough set 
approach, and more generally, of machine learning, data mining and knowledge 
discovery, is confined to problems of multiattribute classification, i.e. to problems 
where neither the values of attributes describing the actions, nor the classes to which 
the actions are assigned, are preference-ordered. On the other hand, MCDA deals with 
problems where descriptions (evaluations) of actions by means of criteria, as well as 
decisions in sorting, choice and ranking problems, are preference-ordered. The 
extension of the rough set approach to problems in which preference-order properties 
are meaningful is possible upon two important methodological contributions 
extensivelly discussed in this paper: 

1 ) approximation of the comprehensive preference relation by dominance relations, 
which allows to deal with preference-order properties of criteria, 

2) analysis of decision examples in a pairwise comparison table, which allows to 
represent preference relations for choice and ranking problems. 

Let us point out the main advantages of the dominance-based rough set approach to 
MCDA in comparison with classical approaches: 

□ preferential information necessary to deal with a multicriteria decision problem 
is asked to the DM in terms of exemplary decisions, 

□ the rough set analysis of preferential information supplies some useful elements 
of knowledge about the decision situation; these are: the relevance of particular 
attributes and/or criteria, information about their interaction (from quality of 
approximation and its analyisi using fuzzy measures theory), minimal subsets of 
attributes or criteria (reducts) conveying an important knowledge contained in 
the exemplary decisions, the set of the non-reducible attributes or criteria (core), 

□ the preference model induced from the preferential information is expressed in a 
natural and comprehensible language of "if..., then..." decision rules; the decision 
rules concern pairs of actions and conclude either presence or absence of a 
comprehensive preference relation; conditions for the presence are expressed in 
“at least” terms, and for the absence in “at most” terms, on particular criteria, 

□ the decision rules do not convert ordinal information into numeric one but keep 
the ordinal character of input data due to the syntax proposed, 

□ heterogeneous information (qualitative and quantitative, ordered and non- 
ordered) and scales of preference (ordinal, quantitative and numerical non- 
quantitative) can be processed within the dominance-based rough set approach, 
while classical MCDM methods consider only quantitative ordered evaluations 
with rare exceptions, 

□ no prior discretization of the quantitative domains of criteria is necessary. 
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□ the decision rule preference model resulting from the rough set approach is more 
general than all existing models of conjoint measurement due to its capacity of 
handling inconsistent preferences, 

□ the proposed methodology is based on elementary concepts and mathematical 
tools (sets and set operations, binary relations), without recourse to any algebraic 
or analytical structures; main ideas such us indiscernibility and dominance are 
very natural and difficult to contest. 

As to axiomatic foundation of the decision rule preference model for multicriteria 
choice and ranking, it has been recently proposed by the authors in [11,12]. We 
proved the equivalence of preference representation by a general non-additive and 
non-transitive model of conjoint measurement and by “if..., then...'’ decision rules. 
Moreover, some well known multicriteria aggregation procedures (lexicographic 
aggregation, majority aggregation, ELECTRE I and TACTIC) were represented in 
terms of the decision rule model; in these cases the decision rules decompose the 
synthetic aggregation formula used by these procedures; the rules involve partial 
profiles defined for subsets of criteria plus a dominance relation on these profiles and 
pairs of actions. Such decomposition makes the preference model more 
understandable for the decision maker. 

It is also worth noting that the rough set approach to information processing is 
complementary to the fuzzy set approach because roughness represents granularity of 
knowledge and fuzziness represents gradedness of similarity, uncertainty and 
preference. The proof of this complementarity was given in [1,5,6] where fuzzy 
similarity, fuzzy dominance and fuzzy multigraded dominance were considered in 
classification, sorting and choice/ranking problems, respectively. The dominance- 
based rough set approach has also been adapted to handle missing values in the 
decision table [7]. 
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Abstract. Distances between possible worlds play an important role in 
logic-based knowledge representation (especially in belief change, rea- 
soning about action, belief merging and similarity-based reasoning). We 
show here how they can be used for representing in a compact and intu- 
itive way the preference profile of an agent, following the principle that 
given a goal G, then the closer a world ui to a model of G, the better 
w. We give an integrated logical framework for preference representation 
which handles weighted goals and distances to goals in a uniform way. 
Then we argue that the widely used Hamming distance (which merely 
counts the number of propositional symbols assigned a different value 
by two worlds) is generally too rudimentary and too syntax-sensitive to 
be suitable in real applications; therefore, we propose a new family of 
distances, based on Choquet integrals, in which the Hamming distance 
has exactly a position very similar to that of the arithmetic mean in the 
class of Choquet integrals for multi-criteria decision making. 



1 Introduction 

The specification of a decision making or a planning process necessarily includes 
the goals, desires and preferences of the agent (we will make use of the generic 
term “preference” for all of these notions - a goal will merely be a specific, strong 
form of preference). At the object level, the preference structure can be either 
a utility function assigning each possible consequence to a numerical value, or a 
weak order relation (possibly allowing for incomparibility in addition to indiffer- 
ence). Now, once it has been fixed what a preference structure is, the question of 
how it should he represented, or in other terms, how it should be encoded so as to 
be “computationally friendly”, arises. A straightforward possibility consists in 
writing it down explicitly, namely by listing all possible consequences together 
with their utility values or by listing all pairs of consequences in the preference 
relation. Clearly, this explicit mode of representation is possible in practice only 
when the number of possible consequences is reasonably low with respect to the 
available computational resources. This is often not the case, in particular when 
the decision to take consists in giving a value to each of the variables of a set of 
n decision variables: here, the set of consequences is equal to the set of feasible 
assignments and thus grows exponentially with n. 
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Therefore, it is obviously unreasonable to require from the agents the speci- 
fication of an explicit utility function or preference ordering (under the form of 
a table or a list) on the set of all solutions. This argues for the need of a com- 
pact (or succinct) representation language for preferences. Furthermore, such a 
language for preference representation should be as expressive as possible, and 
as close as possible to the intuition (ideally, it should be easily translated from a 
specification in natural language of the preferences of an agent) . Lastly, it should 
be equipped with decision procedures, as efficient as possible, so as to enable the 
automation of the search for an optimal collective decision. 

The KR community has contributed a lot to the study of such succint and 
expressive languages for representing knowledge Now, for the last few years, 
there has been several approaches dedicated to the representation of preferences 
of an agent, to be used in a decision making process. They are briefly recalled 
in Section 2. We particularly focus on two families of representation languages, 
whose induced preference structure is a utility function: first, weighted logics, 
which consist in associating to each formula (representing an elementary goal) 
a positive weight representing the penalty for not satisfying this goal (together 
with a way of aggregating penalties coming from different violated goals); second, 
distance-based logics, where the violation of a goal is graded using a distance 
between worlds: given a goal (encoded as a propositional formula) G and given a 
distance d between worlds, then the closer a world m to a model of G, the better 
w (the optimal case being when w satisfies <p). 

In [13] we showed that in some specific cases, a goal base written within the 
framework of distance-based logics could be translated into a goal base written 
within the framework of weighted logics and vice-versa. Here we go further by 
showing that this translation can be done independently of the chosen distance 
and aggregation function, and we propose an integrated logical framework which 
enables both the expression of weighted goals and distance-labelled goals in a 
uniform way. This is done in Section 3. 

In Section 4 we focus on the practical choice of a distance between worlds. 
We first argue that the widely used distance in the Knowledge Representation 
community, namely the Hamming distance, is very poor with respect to expres- 
sivity and robustness to small changes in the propositional language. This leads 
us to propose alternative and more general families of propositional distances: 
we recast the specification or propositional distance in the framework of mul- 
ticriteria decision making and we introduce the class of Choquet propositional 
distances, defined as a Choquet integral. We will show that the position of the 
Hamming distance in this family (to which is belongs) is the same as the posi- 
tion of the arithmetic mean in the family of Choquet integrals; namely, both are 
central but “degenerate” members of their respective families. 



^ Here we reserve the term “knowledge” for its restrictive meaning “information about 
the real world” , excluding thus preferences, goals and desires. 
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2 Logical Representation of Preferences 

2.1 Notations, Definitions, and Basic Assumptions 

From now on, C denotes a propositional language built from a finite set of propo- 
sitional symbols PS, the usual connectives, and two symbols T and _L denoting 
respectively tautology and contradiction. A literal is either a propositional sym- 
bol or its negation. A clause (resp. a cube) is a disjunction (resp. a conjunction) 
of literals. A formula is under CNF (resp. DNF) iff it is a conjunction of clauses 
(resp. a disjunction of cubes). 

n = 2PS is the set of truth assignments (called interpretations, or worlds) 
for £. A world w on PS is denoted by listing the set of literals it satisfies: for 
instance, if PS = {a,b,c,d}, then the world in which a and c are false and b 
and d are true is written as {~^a,b,^c,d). If T is a nonempty subset of VAR, 
then denotes the projection of w on T. For instance, {~^a, b, -ic, is 

the partial world in which b is true and c is false, written as (6, ->c). 

Diff{w, w') denotes the set of propositional symbols that are assigned a dif- 
ferent value by w and w'. For instance, if ic = {-<a, b, ~<c, d) and w' = {~ia, ~<b, c, d) 
then Dif f{w, w') = {b, c}. Mod{ip) is the set of all models of (p. We write w \= ip 
when w G Mod{p), i.e., p is true in w, and p \= if) when ip is a, logical consequence 
of p, i.e., Mod{p) C Mod{ip). 

Throughout this paper we make the assumption that the set of feasible worlds 
is Mod{K) where A is a propositional formula expressing integrity constraints. 
Therefore, the preference structure can be defined on Mod{K). By default we 
assume K = T. 

Lastly, we introduce the notions of [weak] [pseudo-] distances between propo- 
sitional worlds. 

Definition 1 

A (propositional) weak distance is a mapping d: f? x f? ~^]N satisfying 

Sep \/w, w' , d{w, w') = 0 w = w' 

Sym \/w, w' , d{w, w') = d{w' , w) 

If d satisfies furthermore 

TI Vw, w', w” , d{w, w”) < d{w, w') + d{w' , w”) 

then d is a distance. 

If p is a formula and w,w' € f2 then d{w,p) = Tlli^n.yJl]^^pd{w,w') and 
d{p, ip) = d{w, w') 

The Hamming distance du is defined as the number of propositional symbols 
that are assigned different values in the two worlds w, w' , i.e., dH{w,w') = 
\Dif f(w,w')\. The choice of the propositional Hamming distance is by far the 
most frequent one in the Knowledge Representation community. 

The drastic (or Dirac) distance ds is defined by ds{w, w') = i ^ ^ ^ 
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2.2 Logical Representation of Preference: Brief Overview 

A common way for an agent of specifying its preferences consists in listing a 
set of goals, each consisting of a propositional formula, and possibly some extra 
information such as weights, priorities, contexts or distances. How the preference 
relation or the utility function is determined from the set of goals depends on 
how these goals are understood. We list here (by increasing complexity) the most 
common constructions of preference structures from logical specifications. In the 
sequel, the Gi’s and the Ci’s denote propositional formulas, the Oi’s denote 
numbers in a numerical scale (for example M, or IR, or [0, 1], or a finite scale 
etc.), and the di’s denote weak distances. GB is a “goal base” (by analogy with 
“knowledge base”) and ugb (resp. >gb) denotes the utility function (resp. the 
preference relation) induced by GB. 

1. GB = {Gi,...,G4 and ugb{w) = { J fthertiS 

rough representation assumption (the preference structure - a binary utility 
- cannot be more elementary) considers crisp goals, interpreted conjunc- 
tively and in a dependent way (i.e., the violation of any of the goals cannot 
be compensated by the satisfaction of another one). 

2. GB = {Gi,...,G„} and ugb{w) = |{t|rc \= Gi}\. This refinement of 1 con- 
siders crisp but independent goals and allows for compensation. 

3. GB = {Gi, ..., G„} and w >gb w' if and only if {i\w \= Gi\ D {i\w' \= Gi}. 
This (partial) preorder on worlds is nothing but the Pareto ordering. 

4. GB = ({Gi, ..., G„}, >) where > is a priority order on {1, ..., n}; >gb is then 
computed in a way extending (2) and (3) so as to take the priority relation 
between goals into account. See for instance [5]. 

5. GH = {(oi, Gi), ..., (a„, G„)} and 

ugb{w) = Fi{F2{{ai\w ^ Gi},F^{{aj\w |= “'Gj})) 

i.e., ugb{w) is a function of the weights of the goals satisfied by w and of 
the weights of the goals violated by w. When only relative utilities (i.e., 
differences of utilities between worlds) are important (which is usually the 
case), we can make the simplifying assumption that only the violated goals 
count (negotzue representation of preference [21] [12])^, which leads to this 
simplified expression: mgb(w) = F{{ai\w ^ -iGi})). F should of course 
satisfy some specific properties (see [13]). 

6. GB = {Gi : Gi, ..., G„ : G„}. Gi is the context of Gi (conditional goals). 

7. GB = ({Gi, ..., G„}, d) where d is a weak distance; and 

ugb{w) = F{d{w, Gi), ..., d{w, G„)) 

where F is non-increasing in each of its arguments. The intuition is that 
ideally, the world w satisfies G, and when it is not the case, then the further 

^ Or, symetrically, that only the satisfied goals count {positive representation of pref- 
erence) . 




52 



C. Lafage and J. Lang 



to G the world w is, the less preferred it is w.r.t. the satisfaction of G. In 
a more general framework, the distances do not have to be constant, i.e., 
we may start with GB = {(di, Gi), ..., (d„, G„)} and induce u by ugb{w) = 
F{di{w, Gi), ..., dn{w, G„)). 

In this paper we focus on items 5 (weighted logics) and 7 (distance-based 
logics). It has been shown in [13] that these two representation languages can 
be translated from one to each other (under some assumptions). Here we go a 
bit further and propose a representation language which integrate both, namely, 
where preference items are triples composed of a weight, a distance and a goal. 

3 An Integrated Framework for the Logical 

Representation of Weighted and Distance-Graded 
Goals 

In the sequel, we assume that individual goals are specified together with a 
weight and/or a distance. 

Definition 2 A WD goal base is a finite collection of triples GB = 
{(ai,di,Gi), 

..., (a„, G„)} where, for each i, at € IN, di is a weak propositional distance 

and Gi is a consistent propositional formula. 

We have now to choose how a preference structure should be induced from 
G. The preference structure will be numerical, i.e., a utility function. By con- 
vention, we will work with disutility functions on IN (or equivalently, utility 
functions on Z~): elementary disutilities, or penalties, are induced by violation 
of goals, and ideal worlds, if any, have a null disutility, and therefore satisfy all 
goals. A disutility function will be denoted by disu, with the implicit assumption 
disufw) = —u(w). 

Definition 3 Let GH = {(oi, di, Gi), ..., (a„, d„, G„)} be a WD goal base. Then 
disuGsiw) = F{H{ai,d{w,Gi)),...,{H{an,d{w,G„)) 

where 

— F is a commutative, associative, non- decreasing operator on IN , and has 0 
as neutral element; 

— H : IN X IN IN is non- decreasing in both arguments and satisfies 
H{a,x) = 0<J4>a; = 0. 

Common choices for F are max and -I- (see [13], especially for an explanation 
why associativity is required). The operator H, whose role is to aggregate the 
weight of the goal (reflecting its importance) with the dissatisfaction stemming 

^ Note that F has the properties of triangular conorms, except that it is defined on 
IN instead of [0, 1]. 
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from the distance to the goal, does not have to be commutative. Monotonicity 
is natural; the second property means that there is no penalty only if the goal is 
fully satisfied, i.e., w satisfies it. We consider three possible choices for H, which 
differ in their interpretation of the weight of a goal: 

1. Hi{a, x) = min(a, x): a is an upper bound of the penalty; once the distance 
to the goal is at least a, the goal is considered as completely violated: no 
distinction is made between a distance of a and a larger distance. 

f 0 if X = 0 

2. H 2 {a,x) = < .. .„ „ : a is the minimal penalty for a violation 

I Ct X -L II X U 

(even small) of the goal; after this penalty is taken into account, penalty 
grows linearly with distance to the goal; 

3. H^{a,x) = a.x: a determines the rate of growth of the disutility function. 

The choice of H may be further guided by the following consideration: we 
would like our framework to degenerate (a) into the framework of weighted 
logics when all distances are taken to be the drastic distance S and (b) into 
the framework of distance-based logics when all weights are 1. This imposes the 
further constraints (a) H{a, 1) = a and (b) H{l,x) = x, and therefores excludes 
Hi. From now on we assume (unless explicitly stated) that H € {H 2 ,H 3 }. 

The simultaneous presence of both weights and distances makes the 
framework somewhat complex. Furthermore, when GH is a simple base of 
weighted goals (without distances) GB = {(ai, Gi), ..., (o;„, G„)}, we know that 
disuGsiw) can be computed in polynomial time, whereas it is not the case with 
distance-based goals, because in general d{w, Gi) cannot be computed in poly- 
nomial time. 

In the sequel we show that a WD goal base can be translated into a base of 
weighted goals, while preserving the disutility function. This transformation is 
generally not polynomial; however, in some cases it is, especially when goals are 
expressed throw DNF formulas. 

For this we start by giving a translation from distance-based weighted goals 
to standard weighted goals (thus getting rid of distances). We first need the 
following definitions of a discs [13] and circles around a formula: 

Definition 4 Let :p & L, d a weak distance and z G IN . The disc of radius i 
around G is the formula (unique up to logical equivalence) D{G,i,d) such that 
Mod{D{G, i, d)) = {w £ G\d{w, G) < i} and the circle of radius i around G is the 
formula (unique up to logical equivalence) C{G,i,d) such that Mod{C{G,i,d)) = 
{w G G\d(w, G) = i} 

Clearly, we have (1) D{G,0,d) = G(G, 0, d) = G; (2) D{G,i,d) ^ D{G,i + 
l,d); (3) D{G,dmax,d) = T, where dmax = maxw, w' e o d(w,w') (the latter 
quantity is defined because f? is finite); (4) D(Gi V ... V Gp, k, d) = D{Gi, k, d) V 
... V D{Gn, k, d); (5) G(G, z, d) = D{G, i, d) A ~^D{G, z — 1, d). (6) D{G, i, d) = 
G(G,0,d)V...VG(G,z,d). 

Proposition 1 Let GB = {{a,d,G)} be a singleton goal base, and let 'Pi{GB), 
d> 2 {GB) and F^^GB) be the three distance-free goal bases defined as follows. 
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^ let <!>,{GB) = {{l,^C{G,l,d)),{2,^C{G,2,d)),...,{a,^C{G,a,d))}. If 
H = Hi then disuQB = disu^^f^cB)- 

— let d> 2 {GB) = {(a, -iC'(G, 1, d)), {a + 1, ~>G{G, 2, d )), . . . , (a + dmax, 
-<C{G,a,dmax))}. If H = H 2 then disucB = disu<p^(^GB)- 

— let <I> 3 {GB) = {(a, -iG(G, 1, d)), {2a, ~^G{G, 2, d)), . . . , {dmax. a, 

-<C {G , dmax , d))} . If H = H 3 then disucB = disu<p,^(GB)- 

Thus, for H G {Hi, H 2 , H 3 }, a WD goal base can be rewritten equivalently 
into a distance-free goal base. Unfortunately, there is no guarantee that the size 
of <Pi{GB), I> 2 {GB), (I^{GB) should be polynomial, because in general G(G, i, d) 
has a size exponential in |G|. However, if G is under DNF and d = dn then we 
have the following result: 

Proposition 2 [ 13 ] 

Let d = du and G = 71 V ... V 7 p he a propositional formula under DNF, where, 
for each i, ji = l\ A I 2 Ig. is a consistent conjunction of literals. Then 

D{G, k, dn) = V V 

Note that, written this way, D{G, k, dn) still has an exponential size. How- 
ever, it does not have to be written this way. enriching the syntax of propositional 
logic with so-called cardinality-formulas [4], the size of D{G,k,dH) is linear in 
the size of G. Therefore, if G is initially under DNF and d is taken to be the Ham- 
ming distance, computing D{G,k,dH), and therefore, computing DG{G,k,dn) 
and d{w, G), can all be done in polynomial time. Moreover we get easily an upper 
bound of the dniw, G) since D{G, k, dn) = T as soon as k reaches the number 
of literals of the longest of the y/s, i.e. as soon as fc > max^ jy^j. 

Unfortunately, what we gain by using the Hamming distance is lost in expres- 
sivity and sensitivity to the choice of the propositional symbols. In the following 
Section we discuss briefly the cons of the Hamming distance and propose a much 
more general family of weak distances (containing dn) retaining this friendly 
computational properties of the Hamming distance. 

4 Deconstructing Hamming: Topic-Decomposable and 
Choquet-Decomposable Weak Distances 

4.1 Against Hamming 

It is surprising to see how popular the Hamming distance is in the Knowledge 
Representation community^ - not only it is widely used, but it is almost the only 
serious propositional distance considered. However, there are good reasons not to 

^ In this research community it is sometimes called Dalai’s distance after it was used 
in [ 6 ] for belief revision; it was used as well by Forbus [9] for belief update, and by 
many authors for belief merging [15] [16] [10] [2]. 
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choose it. The Hamming distance assumes not only that propositional symbols 
are equally relevant to determining a distance between worlds (which can be fixed 
easily by assigning weights to propositional symbols), but also that they are in- 
dependent from each other and that nothing else is relevant to the determination 
of the distance between worlds. These assumptions are extremely compelling and 
give the Hamming distance very little flexibility. The more serious of its draw- 
backs is its extreme sensitivity to the syntax, or more precisely, to the choice 
of the propositional language. It is obvious that there is an infinity of possible 
choices for the set of propositional symbols to be used for representing a certain 
piece of knowledge. Now, the point with the Hamming distance is that it is ex- 
tremely sensitive to this choice. Let us consider an example. Assume that we talk 
about a variable X which can take any integer value between 0 and 7. The most 
efficient propositional encoding of this variable uses 3 propositional variables x\, 
X 2 , being respectively the 1st, 2nd and 3rd bit of the binary representation of 
the value of X: for instance, A = 6 is represented by the propositional formula 
a;i A a ;2 A -<X 3 . The Hamming distance is extremely counterintuitive here: let Wi, 
W 2 and 1 V 3 be three worlds where X takes respectively the values 3, 4 and 7, all 
other things being equal. Then dH{wi,W 2 ) = 3 while dH{wi,w^) = 1. The intro- 
duction of weights on the symbols do not help much {dH{wi,W 2 ) will be still be 
maximal). The less efficient choice of representing X with 8 propositional vari- 
ables A = 0, ..., A = 7 is not better since it gives W 2 ) = iCs) = 1. 

Finding a representation such that dH(w,w') = |u — v'\ if w and w' differ only 
in the values v and v' they assign to A is far from being obvious®. 

In the rest of this Section we propose much more general classes of distances 
containing the Hamming distance and its weighted variants, and which retain as 
much as possible its computationally friendly properties discussed in Section 3. 



4.2 Topic-Decomposable Weak Distances 

It is often the case that the set of propositional variables intuitively consists of 
the union of sets of variables, each of which corresponds to a specific topic (see 
e.g., [17]). It is not necessary to require that topics should be disjoint. 

Definition 5 Let T = {T\, ...,Tp\ he a collection of nonempty subsets of PS 
(topicsj such that (Ji ^ weak distance d is T -decomposable if and only 

if there exist p weak distances di : 2^ x 2^ — >■ IN , dp : x 2^*> — >■ IN , 

and an aggregation function F : IN ^ IN , such that for all w, w' € 12, 

d{w,w') = F{di{w^P . . . ,dp(w-*'^'’, w'-*'^p)) 

A topic can be considered as a criterion, hence for the choice of the function 
F it is relevant to make use of aggregation functions from the MCDM literature. 

The simplest possible topic is a singleton and the “most” decomposable dis- 
tances are those which are decomposable with respect to the set of singletons. 

® Nevertheless, it is possible, but this representation is not intuitive at all. 
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Definition 6 a weak distance d is symbol-decomposable if and only if (1) it is 
ST -decomposable where ST = {{si}, {s 2 }, • ■ • , {sn}} (2) for each i G 1 . . .n, 

d*((si), (si)) = di((-'Sj), (-'Si)) = 0 and di((si), (-'S*)) = 1. 

More generally, a topic may be the set of propositional symbols used for 
encoding the possible values of a non-binary variable. For instance, if the domain 
of variable v is {1, 2, 3} then we may use the propositional symbols {u = 1, v = 
2,v = 3}® together with the integrity constraint stating that one and only one 
among v=l,u = 2,ri = 3is true^. Still more generally, a topic may concern 
several variables and thus be the union of propositional symbols they generate. 

In the definition of a topic-decomposable distance, nothing was said on the 
aggregation function F. Clearly, how d\,...,dp should be aggregated into d is a 
typical issue of multicriteria decision making. In this paper we focus on the par- 
ticular class of topic-decomposable distances for which the aggregation function 
is a Choquet integral [19] [18]. 

4.3 Choquet-Decomposable Weak Distances 

Definition 7 (fuzzy measures) A (unnormalized, integer-valued) fuzzy mea- 
sure on T = {T\, ...,Tp} is a mapping ^ : 2^ — >■ iV such that (1) = 0; (2) 

for any subsets A and B ofT,ACB implies p,{A) < fj,{B)^. A fuzzy measure 
is said strictly positive if and only if for all X FT, X T ^ implies p,{X) > 0. 



Definition 8 (Choquet integrals) [19] [18] Let n be a fuzzy measure on 2^ 
and X = (xi,...,Xp) a vector of real numbers (with p = \T\). Let a be any 
permutation of {1, ...,p} such that < ... < Xo.(p). The Choquet integral of 
{xi, ...,Xp) with respect to p, is defined as 

p 

Cp{xi , ..., Xp) = '^{fi{{a{i ), ..., cr(p)}) - p,{{a{i -k 1), ..., a{p)}).Xa(i) 
i=l 

Definition 9 A weak distance d is a T-Choquet-decomposable if and only if 
it is T -decomposable and the associated aggregation function F is a Choquet 
integral. 

In other terms, a Choquet-decomposable weak distance d is defined by the 
collection of elementary weak distances di and a fuzzy measure p, : 2^ — >■ [0, 1] 
such that for any w,w', we have d{w,w') = Cp{di(w,w'), ...,dp{w,w')). 

® Note that in spite of the symbol = appearing in them, they are atomic propositional 
symbols. 

^ Note that in practice, it would be more efficient to encode a 3-valued variable with 
two propositional symbols only: it is is well-known that a variable with a value 
domain of cardinality k requires [ln 2 (fe)] , i.e., the upper integer part of the logarithm 
in base 2 of fc. 

® Usually fuzzy measures are also require to satisfy p{T) = 1; however this requirement 
has little impact. 
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Note that if for all di is a weak distance on Tj, then it is not necessarily 
the case that the aggregated distance d is still a weak distance, because it may 
fail to be separable. The necessary and sufficient condition is that any nonempty 
subset of T has a non-null measure: 

Proposition 3 Let di, dp be weak distances on T\, Tp respectively, fj, 
a fuzzy measure on T and d defined by d{w,w') = Cp^{di{w,w'),...,dp{w,w')). 
Then d is a weak distance if and only if pt is strictly positive. 

The problem of whether the triangular inequality (TI) is transferred from 
the di’s to d is more complex. In general, it is not true. A sufficient (but non 
necessary) condition on /r under which d satisfies (TI) as soon as di, ..., dp do is 
the following: 

V(a;i, ...,Xp)\/{yi, ...,yp), Cp{xi, ...,Xp) + Cp{yi, ...,yp) > Cp{xi+yi, ...,Xp + yp) 
We now focus on a particularly interesting class of weak distances. 



4.4 SC-Decomposable Weak Distances 

Intuitively, d is SC-decomposable iff it both symbol-decomposable and Choquet- 
decomposable. 

Definition 10 A weak distance d is a SC-decomposable if and only if (1) it is 
symbol- decomposable and (2) the associated aggregation function F is a Choquet 
integral. 

These weak distances can easily be characterized: 

Proposition 4 . d is SC-decomposable if and only if there exists a fuzzy measure 
fj, such that d{w,w') = pt{Dif f{w,w')). 

From now on we denote by dp^ the STC-decomposable function defined by 
d{w,w') = pi{Diff{w,w')). 

A special case of interest is when /i is additive. In this case we have 
dp{w,w') = f(w w') Therefore, dp is a weighted Hamming dis- 

tance. If furthermore we have /i({*}) = /^({j}) = 1 for all i, j, then dp{w,w') = 
\Dif f{w,w')\, i.e., dp is the Hamming distance dn. 

This sheds some light of what assumptions are implicitly done when choosing 
the Hamming distance, namely: 

1. symbol-decomposability: criteria or topics are identified with single proposi- 
tional symbols. 

2. 1-additivity: propositional symbols are fully independent. 

3. neutrality: propositional symbols have equal importance. 

Therefore, the position of the Hamming distance in the set of distances is 
very similar to the position of the arithmetic mean in the set of aggregation 
functions: namely, it is central, but the choice of dn assumes a lot of very strong 
assumptions without which it is far too arbitrary to be a good choice. 
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Apart of additive measures, another interesting subclass of measures is the 
set of measures /i such that /i(A) depends only on the cardinality of |A|. In this 
case, the Choquet integral associated with n is an Ordered Weighted Average 
(OWA) [22]. This is equivalent to say that an (integer- valued, unnormalized) 
OWA is characterized by a vector a = (ai,...,ap)® such that C^{xi^ ...,Xp) = 
dp is a weak distance if and only if /i is strictly positive, which 
is equivalent to Op > 0. 

The weak distances induced by OWAs include 

— the drastic distance ds, associated with a = (0 , . . . , 0, 1); 

— the Hamming distance dn, associated with a = (l,...,l); 

— the bounded Hamming distances d’^ defined by d'^{w,w') = 

min{k,dH{w,w')) where k G p} (remark that d]^ = ds and 

that d^ = dn)- 

Proposition 5 Let d = dp and G = 71 V ... V 7p be a propositional formula 
under DNF, where, for each i, = l\ /\l\ /\ . . . Al^. is a consistent conjunction 
of literals. Then 

D{G,k,dp)= V f V 

i=l...p \l<Z{l,...,qi},p{T\I)<k Kiel 

Like discs around the DNF for dn, D{G, k, dp) should not be explicitly writ- 
ten this way (or else it has an exponential size). Slightly generalizing cardinality- 
formulas by introducing the syntactical notation (fc,/i) : {li,...,lq} - such a 
formula is satisfied if /x({si, ..., Sg}) < k, where Si is the propositional symbol 
associated with li, D{G, k, dp) can be written with a size linear in the size of G. 

5 Conclusion 

The contribution of this paper to the logical representation of preference in 
twofold. First, we proposed a unified framework enabling the representation of 
weighted goals and distance-labelled goals, extending the approach first pro- 
posed in [13]. Second, we criticized the frequent use of the Hamming distance 
for the representation of preference and proposed a much more general family 
of distances based on Choquet integrals. 
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Abstract. Partially Observable Markov Decision Processes (POMDPs) provide 
an elegant framework for AI planning tasks with uncertainties. Value iteration is 
a well-known algorithm for solving POMDPs. It is notoriously difficult because 
at each step it needs to account for every belief state in a continuous space. In 
this paper, we show that value iteration can be conducted over a subset of belief 
space. Then, we study a class of POMDPs, namely informative POMDPs, where 
each observation provides good albeit incomplete information about world states. 
For informative POMDPs, value iteration can be conducted over a small subset 
of belief space. This yields two advantages: First, fewer vectors are in need to 
represent value functions. Second, value iteration can be accelerated. Empirical 
studies are presented to demonstrate these two advantages. 



1 Introduction 

Partially Observable Markov Decision Processes (POMDPs) provide a general frame- 
work for AI planning problems where effects of actions are nondeterministic and the 
state of the world is not known with certainty. Unfortunately, solving general POMDPs 
is computationally intractable(e.g., [12]). Although much recent effort has been devoted 
to finding efficient algorithms for POMDPs, there is still a significant distance to solve 
realistic problems. 

Value iteration [13] is a standard algorithm for solving POMDP. It conducts a se- 
quence of dynamic programming (DP) updates to improve values for each belief state in 
belief space. Due to the fact that there are uncountably many belief states, DP updates 
and hence value iteration are computationally prohibitive in practice. 

In this paper, we propose to conduct DP updates and hence value iteration over a 
subset of belief space. The subset is referred to as belief subspace or simply subspace. It 
consists of all possible belief states the agent encounters. As value iteration is conducted 
over the subspace, each DP update accounts for only belief states in the subspace. The 
hope is that DP updates over a subset should be more efficient than those over the entire 
belief space. However, for general POMDPs, it is difficult to represent this subspace and 
perform implicit DP updates. Furthermore, occasionally the subspace could be as large 
as the original belief space. In this case, value iteration over subspace actually provides 
no benefits at all. 

We study a class of special POMDPs, namely informative POMDPs, where any 
observation can restrict the world into a small set of states. For informative POMDPs, 



S. Benferhat and P. Besnard (Eds.): ECSQARU 2001, LNAI 2143, pp. 60-71, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 




Value Iteration over Belief Subspace 



61 



the subspace has clear semantics and can be represented in an easy way. Based on 
the subspace representation, we describe how to conduct implicit value iteration over 
subspace for informative POMDPs. Moreover, in informative POMDPs, the subspace 
is expected to be much smaller than the belief space. One expects value iteration over 
subspace can be more efficient than over belief space. 

Informative POMDPs come to be a median ground in terms of informative degree of 
observations. In one extreme case, unobservable POMDPs assume that observations do 
not provide any information about world states and can not restrict the world into any 
range of states(e.g., [11]). In another extreme case, fully observable MDPs assume that 
an observation restricts the world states into a unique state. 

The rest of the paper is organized as follows. In next section, we introduce back- 
ground knowledge and conventional notations. In Section 3, we develop a procedure 
for explicitly conducting value iteration over subspace. In Section 4, we discuss prob- 
lem characteristics of informative POMDPs and problem examples in the literature. In 
Section 5, we show how the problem characteristics can be exploited. Section 6 reports 
the experiments on comparing value iteration over subspace and the standard algorithm. 
Section 7 briefly examines related work. Finally, Section 8 concludes the paper with 
some future directions. 

2 Background 

A POMDP is a sequential decision model for an agent who acts in a stochastic envi- 
ronment with only partial knowledge about the states of the world. The environment is 
described by a set of states S. The agent changes world states by executing one of a 
finite set of actions A. At each point in time, the world is in one state s. Based on the 
information it has, the agent chooses and executes an action a. Consequently, it receives 
an immediate reward r{s, a) and the world moves stochastically into another state s'. 
The transition probability is P(s'|s, a). Thereafter, the agent receives an observation 2 ; 
from a finite set Z of observations randomly. The observation probability is P{z\s' , a). 
The process repeats itself. 

Information that the agent has about the current state of the world can be summarized 
by a probability distribution over 5 [1]. The probability distribution is called a belief 
state and denoted by b. The set of all possible belief states is called the belief space and 
denoted by B. If the agent observes z after taking action a in belief state b, its next belief 
state b' is updated as 



6'(s') = kP{z\s' , a) P(s'|s, a)b{s) (1) 

where kis a. re-normalization constant. Sometimes b' is denoted by r{b, a, z). 

A policy prescribes an action for each possible belief state. In other words, it is a 
mapping from B to A. Associated with policy tt is its value function V^. For each belief 
state b, is the expected total discounted reward that the agent receives by following 

the policy starting from h, i.e. (b) = > where rt is the reward received 

at time t and A (0<A < 1) is the discount factor. It is known that there exists a policy tt* 
such that (b) > V'^{b) for any other policy tt and any belief state b. Such a policy 




62 



W. Zhang 



is called an optimal policy. The value function of an optimal policy is called the optimal 
value function. We denote it hy V*. For any positive number e, a policy tt is e-optimal 
if V'^{b) + e > V*{b) for any belief state b. 

The dynamic programming(DP) update operator T maps a value function V to 
another value function TV that is dehned as follows: for any b in B, 

TV (6) = max[r(&, a) + A P{z\b^ o.)V{T{b, a, z))] (2) 

z 

where r(6, a) = 'Ylis is the expected immediate reward for taking action a in 

belief state b. 

Value iteration is an algorithm for hnding e-optimal value functions. It starts with an 
initial value function Vq and iterates using the formula: Vn = TVn-i. Because T is a 
contraction mapping, Vn converges to V* as n goes to infinity. Value iteration terminates 
when the Bellman residual maxf, \Vn{b) — Vn-i{b)\ falls below e(l — A)/2A. When it 
does, the value function V„ is e-optimal. 

Functions over the state space S are sometimes referred to as vectors. A set V 
of vectors represents a value function / in a subspace B' of belief states if f{b) = 
maxcgv ct-b for any binB'. The notion a.b means the inner product of a and b. We will 
sometimes write f{b) as V(6). A vector in a set is extraneous in B' if, after its removal, 
the set represents that same value function. It is useful in B' otherwise. If a value function 
/ is representable by a set of vectors in B' , there is a minimum set that represents / in 
B'. 

A value function that is representable as a set of vectors in the entire belief space 
is said to be piecewise linear and convex (PLC). It is known that, if the initial value 
function is PLC, then value functions generated by value iteration are PLC and can be 
represented by a finite set of vectors [14]. 

3 Explicit Value Iteration over Subspace 

In this section, we discuss the procedure of explicitly conducting value iteration over 
subspace. First, we study how it is possible to conduct DP updates over subspace. Second, 
we develop a stopping criterion for value iteration to terminate. Finally, we discuss the 
potential benefits of conducting value iteration over subspace and why it is difficult to 
conduct value iteration over subspace. 

3.1 DP Updates over Subspace 

We are interested in subspaces determined by pairs of actions and observations. To 
define a subspace, we assume that the agent can start from any belief state, executes a at 
previous time point and observes z. All possible belief states the agent can reach form a 
set. The set can be denoted by {r(&, a,z)\b G B}. For simplicity, we abuse our notation 
and denote it by t{B, a, z). Obviously, it is a subset of B. Next, we show how to relate 
this subset to DP updates. 

In the right-hand side of DP equation (2), since t(6, a, z) must belong to the set 
t{B, a, z) for each [a, z] pair, the notation V (r(., a, z)) can be viewed as a value function 
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over the subspace T{B,a,z). To make this explicit, we introduce a concept of value 
function over subspace. We use the notation to denote a n-step value function 

over t{B, a, z). It maps any belief state in the subspace into the same value as Vn does. 
With this notation, the DP Equation can be rewritten as: for any b in B, 

Vn+i{b) = max{r(6,a) + X'^P{z\b,a)V^^^^°"'''>{T{b,a,z))}. (3) 



This equation means that the value function Vn+i over B can be represented by a set 
of value functions over subspaces. If this fact is repetitively applied, we 

conclude that (1) for any n, in order to represent V„+i, it suffices to maintain a set 
of value functions over subspaces; (2) in order to represent optimal or near optimal 
value function, it suffices to maintain a set of value functions over subspaces when n is 
sufficiently large. 

The analysis suggests another way to formulate a DP update over a set of subspaces 
as follows. 

given a set of value functions over subspaces, how to compute 

another set of value functions 

In this formulation, one need not to compute value function over belief space. Instead, 
one can compute a set of value functions over a set of subspaces. 

Collectively, DP update over a set {t{B, a, z)} of subspaces can be regarded as that 
over a single belief subspace. For this purpose, we need to define the single subspace 
and value function over it. 

- The union of subspaces Ua^zT{B, a, z) for all possible combinations of actions and 
observations consists of all the belief states the agent can encounter. To ease pre- 
sentation, we denote this union by t{B, A, Z). Since each subspace in it is a subset 
of B, so is t{B, a, Z). 

- Given a set of value functions }, we define value function 141*'^’'^’^^ over 

subspace t{B^ A^ Z) as follows: for any b in t{B, A, Z), 

Vr{B,A,Z)i^b) = (4) 

where [a,z] is a pair such that b is in the set T(B,a,z). In fact, value function 
yPB.A.z) belief state in t{B, A, Z) to the same value as 

With these two notations, DP update over subspace t{B, A, Z) can be formulated as: 

given a value function how to compute 

Under this formulation, at each iteration one needs to compute a value function over a 
subspace rather than the entire belief space. 
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3.2 Stopping Criterion 

In value iteration over belief subspace t{B,A,Z), since and 

specify the same values for belief states in Z) as Vn+i and C„, the Bellman 

residual between and becomes smaller over subspace t{B, A, Z) 

as n increases. Therefore a nature criterion is: when the residual between and 

Vr(B,A,z) t{B, a, Z) falls below e(l — A) /2A, value iteration terminates. 

When value iteration over suhspace terminates, it outputs value function 
It can be used to represent value function Vn+i over B as in (3). Therefore the quality of 
Vt(B,A,z) measured by that of Vn+i - The following theorem gives a condition 

under which value iteration over subspace generates value functions of good quality. 
Note that the condition is more restrictive than the aforementioned one. 



Theorem 1 If 

max \Vn 

beT(B,A,Z) 



value iteration 



over 

wi < 



Vn+i represented by 



is e-optimal. 



subspace terminates when 
e(l — A)/(2|Z|A), value function 

□ 



3.3 Benefits and Difficulties 

In value iteration over subspace, DP update of computing needs to account for 

a subset t{B, A, Z) of B. If the subset t{B, A, Z) is much smaller than B, one expects: 

r(B A Z') 

First, the set representing value function Vn consists of much fewer vectors than 

the set representing y„. Second, since there are fewer vectors representing 
fewer linear programs need to be solved for computing these vectors and determining 
their usefulness. This would lead to computational savings. 

However, there are several difficulties before implementing these potential advan- 
tages for general POMDPs. First, we do not know how to represent subspace t(B,A,Z). 
For now, it is just an abstract notation. Second, the subspace t(H, A,Z) could be as large 
as B occasionally. In this case, value iteration over it does not provide more benefits than 
directly conducting standard DP updates. 

In this regard, informative POMDPs have nice properties which can be explored. 
First, the subspaces have a clear representation. Second, the set r(B, A, Z) is expected 
to be much smaller than belief space B. 



4 Problem Characteristics 

In this section, we describe informative POMDPs and argue that they are suitable in 
modeling some realistic applications. 

4.1 Informative POMDPs 

In POMDP, the agent perceives the world via observations. Starting from any state, if 
the agent executes an action a and receives an observation z, the world states can be 
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categorized into two classes by the observation model: states the agent can reach and 
states it cannot. Formally, the set of reachable states is {s|s € S and P{z\s,a) > 0}. 
We denote it by Saz- 

An [a, z] pair is said to be informative if the size |5az| is much smaller than |5|. 
Intuitively, if the pair [a, z] is informative, after executing a and receiving z, the agent 
knows that the true world states are restricted into a small set. An observation z is said 
to be informative if [a, z] is informative for every action a giving rise to z. Intuitively, 
an observation is informative if it always gives the agent an good idea about world 
states regardless of the action executed at previous time point. A POMDP is said to 
be informative if all observations are informative. In other words, any observation the 
agent receives always provides a good idea about world states. Since one observation 
is received at each time point, a POMDP agent always has a good albeit imperfect idea 
about the world. 

4.2 Problem Class 

Consider a robot navigation problem illustrated in Figure 1 . In the figure, thick lines 
stand for walls and thin lines for nothing(open). At each time point, the robot reads from 
four sensors to determine its current locations. Each sensor informs the robot whether 
there is a wall or nothing along a direction(east, south, west and north). An observation 
is a string of four letters. For example, at location 2, the observation is owow where o 
means nothing and w means wall. We note that the same string is received at location 5. 
Also, the agent receives different strings when it is in any other locations. Accordingly, if 
the observation is owow, the agent knows that its location must be at 2 or 5. The following 
table summarizes the possible strings and the states they restrict the world into. We note 
that any observation can restrict the world into at most two locations although the world 
has ten. 



observations 


states 


observations 


states 


owww 


{1} 


OWOW 


{2,5} 


owoo 


{3,4} 


wwow 


{6} 


wowo 


{7,8} 


woww 


{9,10} 



+ 1 -1 





9 


10 






7 


8 


1 1 


2 


3 


4 


5 


e 1 



Fig. 1. A maze world 



Informative POMDPs are especially suitable for modeling a class of problems. In 
these problems, a state is described by a number of variables (fluents). Some variables are 
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observable and others are hidden. The possible assignments to observable variables form 
the observation space. A specific assignment to observable variables restricts the world 
states into a small range of them. A slotted Aloha protocol problem belongs to this class 
[2,5]. In this problem, a system state consists of the number of backlogged messages 
and the channel status. The channel status is observable and its possible assignments 
form the observation space. On the other hand, the system has no access to the number 
of the backlogged messages. If the maximum number of backlogged messages is set 
to TO and there is n possible channel status, the number of states is to • n. A particular 
assignment on channel status will restrict the system into to states out of to • n. The similar 
problem characteristic also exists in a non-stationary environment model proposed for 
reinforcement learning [7]. 



5 Exploiting Problem Characteristics 

In this section, we show how to carry out value iteration over subspace for informative 
POMDPs. We start from subspace representation. 



5.1 Belief Subspace 

A belief simplex is specified by a lisf of extreme belief states. The simplex wifh extreme 
points bi, b 2 , consists of all belief states of the form where > 0 and 

EtiA. = l. 

To reveal the relation between belief states in subspace t{B, a, z) and the set Saz for 
an [a, z] pair, we define another belief subspace: 

{b\ ^ &(s) = 1.0, Vs G 5az,6(s) > 0}. (5) 

S&Sa,^ 



It can be proven that for any b and [a, z] pair, T{b, a, z) must be in the above set. Due to 
this, from now on we abuse the notation t{B, a, z) to denote the above set '. It is easy 
to see that t{B, a, z) is a simplex in which each extreme point has probability mass on 
one state. The union t{B, A, Z) of these simplexes consists of all possible belief states 
the agent can encounter. 

One example on belief space and subspaces is shown in Figure 2. A POMDP has four 
states and four observations. Its belief space is the tetrahedron ABCD where A, B, C and 
D are extreme belief states. For simplicity, we also use these letters to refer to states. Sup- 
pose that Saz sets are independent of the actions. More specifically, = {A, B, C}, 
Sz:, = {A,B,D}, S^g = {A,C,D}, and 5^3 = {B,C,D}. In this POMDP, belief 
simplexes are four facets ABC, ABD, ACD and BCD and belief subspace t{B, A, Z) 
is the surface of the tetrahedron. We also note that the subspace t{B,A,Z) is much 
smaller than B in size. 

* As a matter of fact, r(B, o, z) is a subset of the set defined in (5). We mention this here but do 
not discuss it further. 
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Fig. 2. Belief space, belief simplexes and belief subspace 



5.2 Value Functions over Subspaces 

Like value functions over B, value functions over simplexes preserve the PLC property 
and can be represented by a set of vectors. For convenience, we use the notation 
to refer to the representing set of In informative POMDPs, each vector in the 

set can be |5az|-dimensional because any belief state in T{B,a,z) has zero 

beliefs for states outside the set Saz ■ 

Given a collection in which each set is associated with an 

underlying states set iS^z, we define a value function over subspace t(,B, Z) 

as follows: for any b in A, Z), 

where [a, z] is a pair such that b G t{B, a, z). The set can be regarded as 

a pool of sets over simplexes. When it needs to determine a value for a belief state, it 
( 1 ) identifies a simplex containing the belief state and (2) computes the value using the 
corresponding set of vectors. The set represents value function ^ 



5.3 DP Update over Subspace 

In this subsection, we show how to conduct implicit DP update over belief subspace. 
The problem is cast as: given a set representing value function 

over subspace t{B, A, Z), how to compute a set 

To compute the set we construct one set ^ for any possible pair 

[a' , z']. Before doing so, we recall how DP update over belief space constructs a vector 
in set TVn- It is known that each vector in V„+i can be defined by a pair of action and 
a mapping from the set of observations to the set V„. Let us denote the action by a and 
the mapping by 6. For an observation z, we use <5^ to denote the mapped vector in V„. 
Given an action a and a mapping S, the vector, denoted by Pa, 5 , is defined as follows: 
for each s in 5, 



Pa,s{s) = r(s,a) + Ay^y^P(s'ls,a)P(zls',a)Sz;(s'). 

Z s' 

^ Although we refer to as a set, it can not be understood as • This union 

induces a different value function from (6). 
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By enumerating all possible combinations of actions and mappings, one can define 
different vectors. All these vectors form a set V„+i, i.e., {(3a,s\a & A,5 \ Z ^ V„}. It 
turns out that this set represents value function Vn+i- 

We move forward to define a vector in ^ given the set represented 

as a pool of sets where vectors in set are |5az| -dimensional. Similar 

to the case in DP update TV„, a vector in set ^ can he defined hy a pair of action 

a and a mapping <5 but with two important modifications. First, the mapping S is from 
set of observations to the set Moreover, for an observation z, 6^ is a vector 

in Second, the vector only need to be defined over fhe sef Sa'z' because the 

beliefs are other states are known to be zero. To be precise, given a pair [a', z '\ , an action 
a and a mapping 8, a vector, denoted by /?a,( 5 , can be defined as follows: for each s in 
Sa'z', 



/3a.5(s) = r(s, a) -I- ^ P(s'|s, a)P( 2 ;|s', a)hz(s'). 
z s'eSaz 

If we enumerate all possible combinations of actions and mappings above, we can 
define various vectors. These vectors form a set 

{!3a,sW €A,S:Z^ & Vz, hz G 

The set is denoted by Note that vectors in the set are |5a'z' | -dimensional. It 

can be proved that the set represents value function \ 

Proposition 1 For any b€ T{B,a,z), = Vn+i{b). □ 

For now, we are able to construct a set for a pair [a, z]. A complete DP 

update over t{B, A, Z) needs to construct such sets for all possible pairs of actions and 
observations. After these sets are constructed, they are pooled together to form a set 
It introduces a value function as defined in (6). 

Theorem 2 For any b G t{B,A,Z), = Vn+i{b). □ 

It is worthwhile mentioning that DP update often computes minimal representing 
sets in size. The set is said to be minimal if each is minimal. Given 

a set , its minimal set can be obtained by using normal prune procedure. 

5.4 Complexity Analysis 

DP update TVn improves values for belief space B, while DP update from 

■yT{B,A,z) jj^pj.Qygj values for belief subspace t{B, A, Z). Since the subspace is much 
smaller than B in an informative POMDP, one expects:(l)fewer vectors are in need to 
represent a value function over a subspace;(2)since keeping useful vectors needs solve 
linear programs, this would lead to computational gains in time cost. Our experimental 
studies have confirmed these two expectations. 
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5.5 Value Iteration over Belief Subspace 

Value iteration over subspace starts with a value function Each 

can be set to be a zero vector. As the iterations proceed, the Bellman Residual becomes 
smaller between two consecutive value functions over t{B, A, Z). In our experiments, 
the threshold for stopping criterion is set to he e(l — A)/2A. 



6 Experiments 

Experiments have been designed to test the performances of value iteration algorithms 
with and without exploiting problem characteristics. Here we report results on the maze 
problem in Eigure 1 . It has ten locations(states). At each time point, the agent can perform 
one of four “move” actions along different directions and a declare-goal action. The 
“move” actions are nondeterministic: they can achieve intended effect with probability 
0.8 but lead to overshooting with probability 0. 1 . The declare-goal action does not change 
the agent’s position. As mentioned previously, an observation is a string of four letters. 
At each time point, the robot receives one among six observations with certainty: owww, 
owow, owoo, wwow, wowo and woww. At location 9 and 10, the declare-goal action yields 
rewards of -f 1 and —1 respectively. Any other combinations of actions and states yield 
no reward. 

This POMDP is informative since any observation can restrict the world into only 
one or two states. If value iteration is conducted without exploiting informativeness, one 
need to improve values over space B{— {6| X)i=i K^i) = 1-0: b{^i) > 0}- Since the 
observations are independent of actions, DP update over subspace need to account for a 
union of six simplexes determined by observations. It is much smaller than B. 

Our experiments are conducted on a SUN SPARC workstation. The discount factor 
is set to 0.95. The precision parameter is set to 0.000001 . The quality requirement e is set 
to 0.01. In our experiments, incremental pruning [15,6] is used to compute sets of vectors 
representing value functions over belief space or subspace. Eor convenience, we use VI 1 
and VI to refer to the value iteration algorithms with and without exploiting regularities 
respectively. We compare VI and VII at each iteration along two dimensions: the size 
of set representing value function and time cost to conduct a DP update. The results are 
presented in Eigure 3. Note that y-axis is drawn in log-scale in the figure. 

The first chart depicts the number of vectors generated at each iteration for VI and 
VII. In VI, at each iteration, we collects the size of minimal set representing value 
function. In VII, we compute six sets representing value functions over six simplexes 
and report the sum of the sizes of these sets. Eor this problem, except first iterations, 
VI generates significantly more vectors than VII. In VI, the number of vectors tends to 
grow severely for first iterations. At the lOth iteration, the number reaches its peak 2500. 
Afterwards, the number of vectors decrease slowly as value iteration proceeds. When 
value iteration terminates, it produces 121 vectors. In contrary, the number of vectors 
generated by VI 1 is much smaller. Our experiments show that the maximum number is 
below 28. After VII terminates, the value function are represented by only 12 vectors. 
This suggests that much fewer vectors are in need to represent value functions over a 
belief subspace. 
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number of iterations number of iterations 



Fig. 3. Comparative study on value iterations over belief space and belief subspace 

Due to the big difference between numbers of vectors generated by VII and VI, 
VII is significantly more efficient than VI. This is demonstrated in the second chart 
in Figure 3. When VII terminates after 162 iterations, it takes around 40 seconds. On 
average, one DP update takes less than 0.25 seconds. For VI, it terminates after 162 
iterations and takes time of 20,361 seconds. On average, each iteration takes around 125 
seconds. Comparing with VI, we see that VII is drastically efficient. 

7 Related Work 

In this paper, we provide a general framework for conducting value iteration over sub- 
space and apply the principles to a special class of POMDPs. The subspace we considered 
here can be viewed as an application of reachability analysis (e.g. [8]). We also note that 
some work proposes to decompose value functions in order to reduce their complex- 
ity [4,10]. In connection to special classes of POMDPs, the assumption in informative 
POMDPs is very similar to that in regional observable POMDPs [15]. Both assume that 
observations can restrict the world into a set of handful of states. However, the moti- 
vations behind them are complementary rather than competitive. Regional observable 
POMDPs are proposed to approximate general POMDPs and our work show that their 
problem characteristic can be exploited to find solutions more efficiently. Some other 
POMDP classes have been examined in the literature. These include memory-resetting 
POMDPs in [9] and near-discernible POMDPs in [16]. 

8 Future Directions 

As our preliminary experiments suggest, conducting value iteration over subspace seems 
to be a promising area. In informative POMDPs, the subspace containing all possible 
belief states an agent can encounter has a clear semantics. One future direction is to 
study how to describe such a subspace even the observations are non-informative. It is 
believable that such a subspace can be still much smaller than belief space under some 
circumstances. 
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Although our experiments are implemented for flat-space POMDPs, we note that 
the same idea is applicable to structural POMDPs(e.g. see [3]). Another direction is to 
combine the representational advantage in structural POMDPs and the computational 
advantage in conducting value iteration over belief subspace. 
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Abstract. Finding optimal policies for general partially observable Markov deci- 
sion processes (POMDPs) is computationally difficult primarily due to the need to 
perform dynamic-programming (DP) updates over the entire belief space. In this 
paper, we first study a somewhat restrictive class of special POMDPs called almost- 
discernible POMDPs and propose an anytime algorithm called space-progressive 
value iteration(SPVI). SPVI does not perform DP updates over the entire belief 
space. Rather it restricts DP updates to a belief subspace that grows over time. It is 
argued that given sufficient time SPVI can find near-optimal policies for almost- 
discemible POMDPs. We then show how SPVI can be applied to more a general 
class of POMDPs. Empirical results are presented to show the effectiveness of 
SPVI. 

1 Introduction 

Partially observable Markov decision processes (POMDPs) provide a general framework 
for AI planning problems where effects of actions are nondeterministic and the state of 
the world is not known with certainty. Unfortunately, finding optimal policies for general 
POMDPs is computationally very difficult [8], Despite of much recent progresses [6,4,5, 
11], our ability to battle the computational complexity of general POMDPs is still limited. 
It is therefore advisable to study special classes of POMDPs and design special-purpose 
algorithms for them. 

Several classes of special POMDPs have been previously investigated. For example, 
Hansen [4] has studied memory-resetting POMDPs, where there are actions that give 
perfect information about the current state of the world. Zhang and Liu [10] have exam- 
ined region-observable POMDPs, where one always knows that the world must be in 
one set of a handful of possible states. 

General POMDP problems are difficult to solve primarily due to the need to perform 
dynamic programming (DP) update over the entire belief space. Solving special POMDP 
problems is easier because they permit one to focus on a subspace. In fully observable 
Markov decision processes (MDPs), for instance, one needs to deal only with the set 
of extreme belief states. In memory resetting POMDPs, one needs to deal only with 
extreme belief states and those reachable from them within a limited number of steps. 

This paper considers POMDPs where there are two types of actions that are infor- 
mally said to be information-rich and information-poor respectively. An information- 
rich action, when executed, always gives one a good idea about the current state of the 
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world. In other words, the possible belief states after an information-rich action com- 
prise only a small subspace of the belief space. An information-poor action, on the other 
hand, provides little or no information. We call this class of POMDPs near-discernible 
POMDPs since one can get a good idea about the current state of the world by executing 
an information-rich action. 

It is arguable that near-discernible POMDPs are general enough for many applica- 
tions with a large flat state space. In some path planning problems, for instance, actions 
can be divided into those that gather information and those that affect the state of the 
world. Often actions in the first category are information-rich and those in the second 
are information-poor. 

We propose an anytime algorithm for near-discernible POMDPs. The algorithm is 
called space-progressive value iteration (SPVI). To avoid high complexity, SPVI does 
not perform DP updates over the entire belief space. Rather, it restricts DP updates and 
hence value iteration to a belief subspace and, when convergence is reached, expands the 
subspace if time permits. So, SPVI works in a fashion similar to the envelope algorithm 
forMDPs [3]. 

For technical reasons, we develop SPVI first for a subclass of near-discernible 
POMDPs called almost-discernible POMDPs. In this subclass, it is easier to discuss 
what initial belief subspace to begin with and how to expand a belief subspace. It is also 
possible to argue for optimality of the policies found. 

SPVI has been evaluated on a number of test problems. Some of the problems are 
solvable by the exact algorithm described in [11], which is probably the most efficient 
exact algorithm. For those problems, SPVI was able to hnd near-optimal policies in 
much less time. For the other problems, SPVI was still able to hnd near-optimal policies 
within acceptable time. 



2 Technical Background 

A POMDP is a sequential decision model for an agent who acts in a stochastic environ- 
ment with only partial knowledge about the state of the world. In a POMDP model, the 
environment is described by a set of states S. The agent changes the states by executing 
one of a hnite set of actions A. At each point in time, the world is in one state s. Based 
on the information it has, the agent chooses and executes an action a. Consequently, it 
receives an immediate reward r(s, a) and the world moves stochastically into another 
state s' according to a transition probability P{s'\s, a). Thereafter, the agent receives 
an observation 2 ; from a hnite set Z according to an observation probability P{z\s' , a). 
The process repeats itself. 

Information that the agent has about the current state of the world can be summarized 
by a probability distribution over 5 [ 1 ] . The probability distribution is called a belief state 
and is denoted by b. The set of all possible belief states is called the belief space and is 
denoted by B. If the agent observes z after taking action a in belief state b, its next belief 
state b' is updated as 



bfs') 



Y,,P{z\s',a)P{s'\s,a)b{s) 
'Es' Es P(z\s', a)P{s'\s, a)b(s) ' 
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We will sometimes denote this new belief state by T{b,a,z). The dominator is the 
probability of observing 2 after taking action a in belief state b and will be denoted by 
P(z|&, a). 

A policy prescribes an action for each possible belief state. In other words, it is a 
mapping from B to A. Associated with policy tt is its value function V^. For each belief 
state b, (6) is the expected total discounted reward that the agent receives by following 
the policy starting from b, i.e. (b) = > where rt is the reward received 

at time t and A (0<A < 1) is the discount factor. It is known that there exists a policy tt* 
such that (b) > V^{b) for any other policy tt and any belief state b. Such a policy 
is called an optimal policy. The value function of an optimal policy is called the optimal 
value function. We denote it by V*. For any positive number e, a policy tt is e-optimal 
if V^{b) + e > V*{b) for any belief state b. Without explicitly referring to e, we will 
sometimes say that such a policy is near-optimal. 

The dynamic programming(DP) update operator T maps a value function V to 
another value function TV that is defined as follows: for any b in B, 

TV (5) = max[r(&, a) + A P(z\b, a)V{T{b, a, z))] 



where r(6, a) = X« t’{s, a)b{s) is the expected immediate reward for taking action a in 
belief state b. 

Value iteration is an algorithm for finding e-optimal policies. It starts with an initial 
value function Vq and iterates using the formula: Vn = TVn-i- Because T is a contrac- 
tion mapping, Vn converges to V* as n goes to infinity. Value iteration terminates when 
the Bellman residual maxf, \Vn{b) — Vn-i{h)\ falls below e(l — A)/2A. When it does, 
the so-called Vn-improving policy given below is e-optimal: for any b in B, 

7t( 6) = argmax[r(&, a) + A P{z\b, a)V„(r(6, a, z))]. 



Functions over the state space S are sometimes referred to as vectors. A set V 
of vectors represents a value function / in a subspace B' of belief states if f{b) = 
maxcgv a.bfor any b in B'. The notion a.b means the inner product of a and b. We will 
sometimes write f{b) as V(6). A vector in a set is extraneous in B' if, after its removal, 
the set represents that same value function in B' . It is useful in B' otherwise. If a value 
function / is representable by a set of vectors in B' , then there is a minimum set that 
represents f in B' . None of the vectors in this unique set are extraneous in ,B'. A value 
function that is representable as a finite set of vectors in the entire belief space is said to 
be piecewise linear and convex (PLC). 

Since there are infinitely many possible belief states, value functions cannot be 
explicitly represented as a table. Sondik [9] has shown that if a value function V is PLC, 
then so is TV. Consequently, if the initial value function is PLC, every value function 
generated by value iteration is PLC and hence can be represented by a finite number of 
vectors. 
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3 Space-Progressive Value Iteration in Almost-Discernible 
POMDPs 

In this section, we define almost-discernible POMDPs and outline space-progressive 
value iteration (SPVI). We also explain why SPVI is feasible and argue that given suffi- 
cient time it can find near-optimal policies. 



3.1 Almost-Discemible POMDPs 

For any action a, let Za be the set of observations that are possible after taking action a. 
For any z in Za, define Saz = {s : P{z\a, s)>0}. If z is observed after action a, the true 
state must be in the set Saz and one’s belief state must be in Baz = {b : 6(s)=0 Vs ^ 
Saz}. 

If there is an integer k such that |5az | < k for every z in Za, then executing a would 
narrow the true state down to no more than k possibilities. When this is the case, we 
say that the action is k-focal. For convenience, we will use the term information-opulent 
actions to refer to actions that are fc-focal for some small integer k. 

We say that a POMDP is k-discernible if it consists of one or more fc-focal actions and 
all other actions are information-void in the sense that they do not yield any observations. 
For convenience, we will use that term almost-discernible POMDPs to refer to POMDPs 
that are fc-discernible for some small k. To simplify exposition, we assume that there is 
only one information-opulent action and will denote it by d. Note that after executing 
d, one’s belief state must be in Ozfzz^Bdz, where the union is taken over all possible 
observations that d might produce. 

Almost-discernible POMDPs are obviously a generation of memory-resetting 
POMDPs. They also generalize region-observable POMDPs: An almost-discernible 
POMDP is region-observable if all actions are information-opulent. 



3.2 Space-Progressive Value Iteration 

SPVI starts with the belief subspace Oz^Zd^dz- It performs value iteration restricted to 
the subspace and, when convergence is reached, expands the subspace. SPVI continues 
subspace expansions as long as time permits. When time runs out, SPVI returns the best 
policy found so far ^ . 



Belief Subspace Representation. In this paper, we consider only subspaces that are 
unions of belief simplexes. A belief simplex is specified by a list of extreme points. 
The simplex with extreme points b\,b 2 , ■ ■ ■ ,hk consists of all belief states of the form 
LLi ■where A* > 0 and Yh=i = 1- 

The initial belief subspace Uz^Zd^dz is the union of belief simplexes. To see this, 
define, for any state s, Xs to be the extreme belief state such that Xs(sO=l if and 
only if s'=s. It is clear that B^z is a simplex with the following set of extreme points: 
{Xs : s G Sdz}. 

* The quality of a policy can be determined by simulation. 
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In the rest of this subsection, we consider a belief subspace B' that consists of m 
simplexes ■ ■ ■ , B'^^. We explain how to restrict DP update to B', how to guarantee 

the convergence of value iteration when restricted to B', and how to expand the belief 
subspace B' when convergence is reached. 



Restricted DP Update. DP update takes as input the minimum set of vector that repre- 
sent a PLC value function V and computes the minimum set 14 of vectors that represents 
TV in the entire belief space. A simple and comparatively efficient algorithm for DP 
update is incremental pruning [10,2]. It solves a series of linear programs. Each linear 
program involves a vector a and a set of other vectors W. The purpose of the linear 
program is to determine whether there is a belief state b where a dominates vectors in 
W, i.e. a.b > (3.b for all (3 G W. The linear program is as follows: 

Maximize: x. 

Constraints: 

b.a > x+b.f3yf3GW 
J2s ^('®) = ^ 't'sGiS 

The vector a dominates vectors in W at some belief states if and only the optimal value 
of X is positive. We will refer to this linear program as LP{a, W, B). 

Restricting DP update to the belief subspace B' means to compute the minimum set 
14' of vectors that represents TV in that subspace. To do so, SPVI first does this in each 
belief simplex B'j, resulting in a set of vectors TYj. It then takes the union of those sets. 
Some vectors in the union might be extraneous in B'. All such vectors can be identified 
and removed by checking for duplicates. 

Restricting DP update to a simplex of belief states requires little change to incre- 
mental pruning. Let B'j be the simplex with extreme points bi, 62 . ■ • • , To restrict 
DP update to B'j, all one has to do is to modify each linear program LP{a, W, B) as 
follows: 



Maximize: x. 

Constraints: 

J2i=i Kh-a > x+ Y!1 =i V/3gW 

Ei=i Ai = 1, A, > 0 VI < i < fc 

We will refer to this linear program as LP{a, W, B'^). 

Restricting DP update to a subspace reduces computational complexity for two rea- 
sons. First, the number of variables in LP{a, W, B'j) is k+l, while that in LP{a, W, B) 
is |5|-|-1. When solving near-discernible POMDPs, k is always much smaller than |5|. 
Consequently, the numbers of variables in linear programs are reduced. Second, one 
needs fewer vectors to represents TV in a subspace than in the entire belief space. This 
implies fewer linear programs and fewer constraints in linear programs. 



Restricted Value Iteration. Restricting value iteration to a belief subspace means to 
restrict DP update to the subspace at each iteration. Unlike unrestricted DP update, 
restricted DP update is not necessarily a contraction mapping. There is therefore an 
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issue of ensuring the convergence of restricted value iteration. A simple solution is to 
start SPVI with a value function that is sufficiently small so that it is upper bounded by 
the optimal value function and to maintain monotonicity during restricted value iteration. 

For each simplex Bj , let Vj be the minimum set of vectors that represents V in the 
simplex. This was computed in the previous iteration. After DP update in Bj, we get 
another set 14^ of vectors. To maintain monotonicity, SPVI takes the union Vj U 14^ and 
then prunes vectors that are extraneous in Bj by solving linear programs of the form 
LP{a,W,B'^). 

To determine whether to terminate restricted value iteration, SPVI considers the 
quantity max^j^ max^gg' Vj ( 6 )], which can be computed by solving linear 
programs of the form LP{a,Vj,B'j). It terminates restricted value iteration when the 
quantity drops below a predetermined threshold. In our experiments, it is set at 0.1(1 — 
A) /2A — the threshold for Bellman residual that exact value iteration would use in order 
to hnd 0 . 1 -optimal policies. 



Belief Subspace Expansion. When restricting value iteration to B', we are concerned 
only with values for belief states in B' . However, values for some belief states in B' might 
depend on values for belief states outside B'. Hence it is natural to, when convergence 
is reached in B', expand B' to include all such belief states. 

Let V and U be the second last and the last value functions produced by restricted 
value iteration before it converges in B' . Further let tt be the V-lmproving policy. The 
value of U at a belief state b in B' depends on the values of V at belief states that are 
reachable from b be executing action 7t(6). 

For any belief simplex H', any action a, and any z in Za, define r(H' , a, z) = 
{t(6, a,z) : b G Bj}. Suppose has k extreme points bi, 62 , ... , bk- It can be shown 
that r(H' , a, z) is a simplex with the following set of extreme points 

{r( 6 j, a, z) : z G {1, 2, . . . , fc} and P{z\a, bi) > 0)} 

Note that t(H' , a, z) has no more extreme points than B). 

It is clear that Uzg t(H' , a, z) is the set of belief states reachable from inside by 
executing action a. Let Aj be the set of actions that tt prescribes for belief states in . 
Then AJLi UaeA^ contains all the belief states whose values under V 

influence the values of U inside B' . 

SPVI expands B' by including more belief simplexes: for each j, each a in Aj, and 
each z in Za, it adds the simplex r{B),a, z) to the collection of simplexes if it is not a 
subset of any existing simplexes. 



3.3 Feasibility 

SPVI would not be computationally feasible if the number of belief simplexes increases 
quickly. In this subsection, we argue that this is not the case thanks to the properties of 
almost-discernible POMDPs and the way the initial belief simplexes are chosen. 

First note that when a is the information-opulent action d, the simplex t{B), d, z) is 
a subset of Bdz, which is in the collection to begin with. Consequently, it is not added 
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to the collection. Second, each Bj is small in size. The information-void actions that 
the \^-improving policy prescribes for belief states are usually only a fraction of all 
information-void actions. Moreover, each information-void action has only one possible 
observation, namely the void observation. All those point to the conclusion that the 
number of belief simplexes does not increase quickly. 

Although SPVI is proposed as an anytime algorithm, we argue below that, given suf- 
ficient time, SPVI will eventually converge in the sense that underlying belief subspace 
will stop growing. Define a history to be an ordered sequence of action-observation pairs. 
For any belief subspace B and any history h, let t{B, h) be the set of belief states that 
are possible when the history h is realized starting from inside B. Using this notation, 
we can rewrite each initial belief simplex Bdz as t{B, [d, z\), where \d, z] is the one step 
history where z is observed after taking action d. Similarly T{B'j,a, z) can be rewritten 
asT{B'j,[a,z]). 

It is easy to see that each belief simplex that SPVI encounters can be written as 
t{B, h) for some history h. As already pointed, each initial belief simplex B^z can be 
written as t{B, [d, z\). Now if B'^ is t{B, h), then T{B'j,a, z) can be written as t{B, h'), 
where h'=h[a, z] is obtained by appending the pair [a, z] to the end of h. 

Let t{B, h) be a particular belief simplex that SPVI encounters. We claim that all 
actions in h are information-void except for the first one. This is trivially true if t{B, h) is 
an initial belief simplex t{B, [d, z] ) . Suppose the statement is true if the length of his k. 
Consider expanding t{B, h). For each action a and each z G Za, we get a new simplex 
t{B^ h[a, z]). When a is d, the new simplex is a subset of the initial belief simplex B^z 
and hence is not added to the collection of simplexes. When a is d, all actions in h[a, z] 
are information-void except for the first one. So the claim follows by induction. 

Intuitively, after a long sequence of information-void actions, one should be quite 
uncertain about the current state of the world. As such, good policies should prescribe the 
information-opulent action for belief states in r(,B, h) when h is long. The V -improving 
policy intuitively should be good as the underlying subspace grows large and hence 
should prescribe only the information-opulent action for belief states in t(B, h) when 
h is long. When this happens, no new belief simplexes will be produced from t{B, h). 
Consequently, SPVI should eventually stop introducing new belief simplexes and hence 
converge. 

It should be noted that the foregoing arguments do not constitute proofs. The con- 
clusions drawn might not be true in all cases. However, they are indeed found to be true 
in the problems used in our experiments. 



3.4 Optimality 

Assume SPVI does eventually converge. In this subsection, we argue that SPVI can find 
near-optimal policies. 

Let B' be the final belief subspace, V and U be the second last and the last value 
functions, and tt be the V -improving policy. The way that SPVI expands a subspace and 
the fact that B' is the final subspace imply that B' is closed under tt in the following 
sense: Starting from inside B' , the policy does not lead to belief states outside B' . Let 
Vb' be the set of all policies under which B' is closed. In the following, we prove that tt 
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is near-optimal in B' among all policies in Vb' , i-C- V'^ (b) is close to max^r'e-pg, (6) 
for any belief state bin B' . 

The POMDP under discussion can be transformed into an MDP over the belief space 
B. Denote this MDP by A4. Define another MDP A4' that is the same as A4 except that 
its state space is B' and possible actions for each belief state in B' include only those 
that do not lead to belief states outside B' . Let V' and tt' be respectively the restrictions 
of V and tt onto B' . Imagine carrying out value iteration for AI' starting from V' . Let 
U' be the value function obtained after the first iteration. The fact that B' is closed under 
TT implies that tt' is a legitimate policy for AI' and is A'-improving. It also implies that 
C/'(6)=C/(6) forany 6in,B'.HencemaxbgB/ \V'{b) — U'{b)\ = maxhgg/ \V{b) — U{b)\. 
If restricted value iteration was terminated when the latter quantity is small, then tt' is a 
near-optimal policy for Af'. Together with the fact all policies in Vb', when restricted 
to B' , are policies for Al', this allows us to conclude tt is near-optimal (for the original 
POMDP) in B' among policies in Vb' ■ 

The intuition that near-optimal policies should prescribe the information-opulent 
action for belief state in t{B, h) when h is long gives us some reason to believe that the 
set Vb' contains near-optimal policies. If this is the case, the policy tt is near-optimal in 
B' among all policies. Since one can always get into B' be taking the information-opulent 
action, tt is near-optimal in the entire belief space. 

As in the previous subsection, we wish to note that some of the foregoing arguments 
rely strongly on intuitions. The conclusion drawn might not be true in all cases. However, 
our experiments do indicate that SPVI can find near-optimal policies after a few subspace 
expansions. 



4 Empirical Results 



We have tested SPVI on a number of problems. The results are encouraging. Due to 
space limit, we discuss only the results on a simple maze game. The layout of the maze 
is shown in Figure 1. There are 1 1 states: 10 locations plus one terminal state. An agent 
needs to move to location 9 and declare goal. There are six actions: four “move" actions 
plus look and declare-goal. The look action is information-opulent (2-focal) and all 
other actions are information-void. 

The “move" actions allow the agent to move in each of the four nominal directions 
and have 80% chance of achieving their intended effects, i.e. moving the agent one step 
in certain direction. Moving against maze walls leaves the agent at its original location. 
These actions have no rewards. At locations 9 and 10, the declare-goal action yield 
rewards of H-1 and -1 respectively and it moves the game into the terminal state. In all 
other states, it does not cause state transitions and has no reward. The look action does 
not have rewards and does not cause state transitions. It produces observations except in 
the terminal state. The observation it produces at a location is a sequence of four letters 
indicating, for each of the four directions, where there is a wall (w) or nothing (o). There 
are 4 pairs of locations that have the same observations. For example, locations 2 and 
5 both have observation owow, meaning that there are walls in the South and North and 
the other two directions are open. Observations for the other 2 locations are unique. 
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Fig. 1. Layout of test problem domain. 



We have run SPVI on two versions of the maze problem: the original problem and 
a simpler version where locations 1 and 6 are deleted. The two versions will be referred 
to as Maze and Mazel respectively. The point-based improvement technique described 
in Zhang and Zhang [1 1] is incorporated to speed up the convergence of restricted value 
iteration. The discount factor is set at 0.95. After restricted value iteration converges in 
a belief subspace, simulation is conducted to determine the quality of the policy found. 
The simulation consists of 1000 trials. Each trial starts from a random initial belief state 
and is allowed to run up to 20 steps. The average reward across all the trials is used as a 
measurement of policy quality. 

The charts in Figure 2 depict quality of policies SPVI found in the belief subspaces 
against the time taken. Simulation time is not included. There is one data point for 
each belief subspace. What is shown for a subspace is not necessarily the quality of the 
policy found in that subspace. Rather, it is the quality of the best policy found so far. 
We present data this way for the following reason. As subspace grows, SPVI computes 
better and better value functions. However, as is well known, better value function does 
not necessarily imply better policy. In other words, the policy found in a larger subspace 
is not necessarily better than the one found in a smaller subspace. Knowing this fact, 
one naturally would want to keep the best policy so far. 

In Mazel, SPVI converged after II expansions and the final number of simplexes 
is 100. In Maze, it converged after 22 expansions and the final number of simplexes is 
432. These support our claims that the number of simplexes does not grow quickly and 
subspace expansion will eventually terminate. 

Using the algorithm described in Zhang and Zhang [11], we were able to compute 
0.1-optimal policies for Mazel in 6,457 seconds. This provides us with a benchmark to 
judge how close the policies SPVI found are to the optimal. From Figure 2, we see that 
SPVI found a near-optimal policy for Mazel in less than 50 seconds after two subspace 
expansions This is much less time than 6,457 seconds. 

For Maze, the algorithm described in Zhang and Zhang [11] did not converge after 
24 hours. On the other hand, SPVI was able to find a policy whose average reward is 
0.46 in 13 seconds after only one subspace expansion and another policy whose average 
reward is 0.48 in 520 seconds after 6 subspace expansions. Since the optimal average 



^ One might notice that some of the policies found by SPVI appear to be better than the 0.1- 
optimal policy. This is probably due to randomness in simulation. 
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Fig. 2. Performance of SPVI in two almost-discemible POMDPs. 



reward one can get in Maze should be less than that in Mazel and the latter is 0.52, there 
are good reasons to believe the two policies found by SPVI are near-optimal. 

QMDP [7] is a simple approximation method whose performance has been shown 
to be fairly good compared to other approximation methods [5]. It is computationally 
much cheaper than SPVI. Naturally, we are interested in how the two methods differ in 
terms of quality of policies they obtain. From Figure 2, it is clear that SPVI found much 
better policies than QMDP. 

The poor quality of the QMDP approximation in our test problems can easily be 
explained. The symmetric nature of the domain mains that our agent can easily confuses 
the left and right halves. The only way to disambiguate the uncertainty is to move to 
either location 1 or location 6. Policies produced by QMDP simply do not have such 
sophisticated information-gathering strategy. 



5 Near-Discernible POMDPs 



The conditions that dehne almost-discernible POMDPs are rather restrictive. In this 
section, we relax those conditions to define a more general class of POMDPs called 
near-discernible POMDPs and show how SPVI can be adapted for such POMDPs. 

We say that an action a is information-rich if, for each observation z in Za, the 
probability P(z|s, a) is close to zero except for a small number of states. On the other 
hand, if P{z\s, a) is significantly larger than zero for a large number of states, then we 
say that a is information-poor. A POMDP is near-discernible if its actions are either 
information-rich or information-poor. To simplify exposition, we assume that there is 
only one information-rich action and we denote this action by d. 

One difficulty of applying SPVI to near-discernible POMDPs is that the set S^z can 
be large in cardinality. When this is the case, the belief simplex Bdz is also large and 
consequently the complexity of SPVI would be high if it starts with the belief subspace 
Uz^Zd^dz- To overcome this difficulty, we introduce a number S that close to zero and 
define = {s : P{z\a, s)>(5}. By the dehnition of information-rich actions, the set 
S^z should be small. We further dehne = {b : 6(s)=OVs ^ SPVI starts 

with the belief subspace Uz^Zal^dz- our experiments, <5 is set at 0.1. 
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Fig. 3. Performance of SPVI in two near-discernible POMDPs. 



The above way of choosing the initial belief subspace gives rise to another issue; new 
belief simplexes , d, z) generated by the information-rich action are not necessarily 
a subset of the initial belief simplex . If belief subspaces are expanded in the same way 

as in almost-discernible POMDPs, the number of belief simplexes will increase quickly. 
To avoid this problem, new belief simplexes z) generated by the information- 

rich action are simply discarded. 

To evaluate the performance of SPVI in near-discernible POMDPs, we modified the 
two problems described in the previous section. The only changes are in the observation 
probability of the look. In the original problems, look produces, at each location, a 
string of four characters with probability 1 . Let us say that the string is the ideal string 
for the location. In the modified problems, look produces, at each location, the ideal 
string for that location with probability around 0.8. With probability 0.05, it produces 
the void observation. Also with probability 0.05, it produces a string that is ideal for some 
other location and that differs from the ideal string for the current location by no more 
than 2 characters. After the modifications, look is no longer an information-opulent 
action. It produces the void observation with nonzero probability at all states. 

The performance of SPVI in the two modified problems is shown in Figure 3. In 
modified Mazel, SPVI converged affer 11 subspace expansions. In modified Maze, it 
was manually terminated after 9 subspace expansions. Intuitively, the maximum average 
rewards one can get in the modified problems should be less than those in the original 
problems. This and a quick comparison of Figures 2 and 3 suggest that the policies that 
SPVI found for the modified problems after two subspace expansions are near-optimal. 
They are much better than the policies found by QMDP. 



6 Conclusions 

Finding optimal policies for general partially observable Markov decision processes 
(POMDPs) is computationally difficult primarily due to the need to perform dynamic- 
programming (DP) updates over the entire belief space. In this paper, we first studied a 
somewhat restrictive class of special POMDPs called almost-discernible POMDPs and 
proposed an anytime algorithm called space-progressive value iteration (SPVI). SPVI 
does not perform DP updates over the entire belief space. Rather it restricts DP updates to 
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a belief subspace that grows over time. It was argued that given sufficient time SPVI can 
find near-optimal policies for almost-discernible POMDPs. We then showed how SPVI 
can he applied to more a general class of POMDPs called near-discernible POMDPs, 
which is arguably general enough for many applications. We have evaluated SPVI on a 
number of test problems. For those that are solvable by previous exact algorithms, SPVI 
was able to find near-optimal policies in much less time. For the others, SPVI was still 
able to find, with acceptable time, policies that are arguably near-optimal. 
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Abstract. The design of autonomous agents that are situated in real 
world domains involves dealing with uncertainty in terms of dynamism, 
observability and non-determinism. These three types of uncertainty, 
when combined with the real-time requirements of many application do- 
mains, imply that an agent must be capable of effectively coordinating its 
reasoning. As such, situated belief-desire-intention (bdi) agents need an 
efficient intention reconsideration policy, which defines when computa- 
tional resources are spent on reasoning, i.e., deliberating over intentions, 
and when resources are better spent on either object-level reasoning or 
action. This paper presents an implementation of such a policy by mod- 
elling intention reconsideration as a partially observable Markov decision 
process (pomdp) . The motivation for a pomdp implementation of inten- 
tion reconsideration is that the two processes have similar properties 
and functions, as we demonstrate in this paper. Our approach achieves 
better results than existing intention reconsideration frameworks, as is 
demonstrated empirically in this paper. 



1 Introduction 

One of the key problems in the design of belief-desire-intention (bdi) agents is 
the selection of an intention reconsideration policy [3, 8]. Such a policy defines 
the circumstances under which a bdi agent will expend computational resources 
deliberating over its intentions. Wasted effort — deliberating over intentions 
unnecessarily — is undesirable, as is not deliberating when such deliberation 
would have been fruitful. There is currently no consensus on exactly how or when 
an agent should reconsider its intentions. Current approaches to this problem 
simply dictate the commitment level of the agent, ranging from cautious (agents 
that reconsider their intentions at every possible opportunity) to bold (agents 
that do not reconsider until they have fully executed their current plan). Kinny 
and Georgeff investigated the effectiveness of these two policies in several types 
of environments [3] ; their analysis has been extended by others [8] . 

Our objective in this paper is to demonstrate how to model intention recon- 
sideration in belief-desire-intention (bdi) agents by using the theory of Markov 
decision processes for planning in partially observable stochastic domains. We 
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view an intention reconsideration strategy as a policy in a partially observable 
Markov decision process (pomdp): solving the pomdp thus means finding an 
optimal intention reconsideration strategy. We have shown in previous work [8] 
that an agent’s optimal rate of reconsideration depends on the environment’s 
dynamism - the rate of change of the environment, determinism - the degree of 
predictability of the system behaviour for identical system inputs, and observ- 
ability - the extent to which the agent has access to the state of the environment. 
The motivation for using a pomdp approach here is that in the pomdp framework 
the optimality of a policy is largely based on exactly these three environmental 
characteristics. 

The remainder of this paper is structured as follows. We begin by providing 
some background information on the bdi framework in which the problem of 
intention reconsideration arises. In Section 3 we discuss the Markov decision 
framework upon which our approach builds and present the implementation of 
intention reconsideration with a pomdp. In Section 4 we empirically evaluate 
our model in an agent testbed. Finally, in Section 5 we present some conclusions 
and describe related and future work. 



2 Belief-Desire-Intention Agents 

The idea of applying the concepts of beliefs, desires and intentions to agents 
originates in the work of Bratman [2] and Rao and Georgeff [6]. In this paper, 
we use the conceptual model of bdi agency as developed by Wooldridge and 
Parsons [10]. The model distinguishes two main data structures in an agent: a 
belief set and an intention set^. An agent’s beliefs represent information that the 
agent has about its environment, and may be partial or incorrect. Intentions can 
be seen as states of affairs that an agent has committed to bringing about. We 
regard an intention as a simple unconditional plan. The behaviour of the agent 
is generated by four main components: a next-state function, which updates 
the agent’s beliefs on the basis of an observation made of the environment; a 
deliberation function, which constructs a set of appropriate intentions on the 
basis of the agent’s current beliefs and intentions; an action function, which 
selects and executes an action that ultimately satisfies one or more of the agent’s 
intentions; and a meta-level control function, the sole purpose of which is to 
decide whether to pass control to the deliberation or action subsystems. On 
any given control cycle, an agent begins by updating its beliefs through its 
next-state function, and then, on the basis of its current beliefs, the meta-level 
control function decides whether to pass control to the deliberation function (in 
which case the agent expends computational resources by deliberating over its 
intentions), or else to the action subsystem (in which case the agent acts). As a 
general rule of thumb, an agent’s meta-level control system should pass control 
to the deliberation function when the agent will change intentions as a result; 

^ Since desires do not directly contribute to our analytical discussion of intention re- 
consideration, they are left out of the conceptual bdi model in this paper. This 
decision is clarified in [10]. 
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otherwise, the time spent deliberating is wasted. Investigating how this choice is 
made rationally and efficiently is the main motivation behind the work presented 
in this paper. 

We have to consider that agents do not operate in isolation: they are situated 
in environments; an environment denotes everything that is external to the agent. 
Let P be a set of propositions denoting environment variables. In accordance 
with similar proposition based vector descriptions of states, we let environment 
states be built up of such propositions. Then P is a set environment states with 
members {e, e', . . .}, and e = {pi, . . . ,Pn}, where pi G P. 

The internal state of an agent consists of beliefs and intentions. Let Bel : 
E — >• [0,1], where denote the agent’s beliefs: we represent 

what the agent believes to be true of its environment by defining a probability 
distribution over the possible environment states. The agent’s set of intentions, 
Int, is a subset of the set of environment variables: Int C P. An internal state 
s is a pair s = {Bel, Int) , where Bel : P — >■ [0, 1] is a probability function and 
Int C P is a set of intentions. Let S be the set of all internal states. For a 
state s G S, we refer to the beliefs in that state as Bels and to the intentions as 
Ints- We assume that it is possible to denote values and costs of the outcomes 
of intentions^: an intention value V : Int — >■ M represents the value of the 
outcome of an intention; and intention cost C : Int -G IR represents the cost of 
achieving the outcome of an intention. The net value V„et '■ Int -G IR represents 
the net value of the outcome of an intention; Vnet{i), where i G Int, is typically 
V(i) — C{i). We can express how “good” it is to be in some state by assigning 
a numerical value to states, called the worth of a state. We denote the worth of 
a state by a function W : S ^ M, and we assume this to be based on the net 
values of the outcomes of the intentions in a state. Moreover, we assume that one 
state has an higher worth than an other state if the net values of all its intentions 
are higher. This means that if Vs, s' G S,\/i G Ints, Vi' G Ints> ,Vnet{i) > Vnet{i'), 
then W{s) > W{s'). In the empirical investigation discussed in this paper, we 
illustrate that a conversion from intention values to state worths is feasible, 
though we do not explore the issue here^. Finally, Ac denotes the set of physical 
actions the agent is able to perform; with every a G Ac we identify a set of 
propositions Pq C P, which includes the propositions that change value when a 
is executed. 

In this conceptual model, the question of intention reconsideration thus ba- 
sically boils down to the implementation of the meta-level control function. On 
every given control cycle, the agent must decide whether it acts upon its cur- 
rent intentions, or to adopt new intentions and this is decided by the meta-level 

^ We clearly distinguish intentions from their outcome states and we do not give values 
to intentions themselves, but rather to their outcomes. For example, when an agent 
intends to deliver coffee, an outcome of that intention is the state in which coffee has 
been delivered. 

^ Notice that this problem is the inverse of the utilitarian lifting problem: the problem 
of how to lift utilities over states to desires over sets of states. Discussing the lifting 
problem, and its inverse, is beyond the scope of this paper, and therefore we direct 
the interested reader to the work of Lang et al. [4] . 
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control function. We continue with discussing how this implementation can be 
done by using Markov decision processes. 



3 Implementing Intention Reconsideration as a pomdp 



In this paper, the main point of our formalisation of intention reconsideration 
is the POMDP implementation of it. The fact that the optimality of a pomdp 
policy is based on the environment’s observability, determinism and dynamism, 
renders the framework appropriate in the context of intention reconsideration. 
In this section, we explain what a pomdp is and how to use it for implementing 
intention reconsideration. 

A partially observable Markov Decision Process (pomdp) can be understood 
as a system that at any point in time can be in any one of a number of distinct 
states, in which the system’s state changes over time resulting from actions, 
and where the current state of the system cannot be determined with complete 
certainty [1]. Partially observable mdps satisfy the Markov assumption so that 
knowledge of the current state renders information about the past irrelevant to 
making predictions about the future. In a pomdp, we represent the fact that 
the knowledge of the agent is not complete by defining a probability distribution 
over all possible states. An agent then updates this distribution when it observes 
its environment. 

Let a set of states be denoted by S and let this set correspond to the set of 
the agent’s internal states as defined above. This means that a state in the MDP 
represents an internal state of the agent. We let the set of actions be denoted by 
A. (We later show that A yf Ac in our model.) An agent might not have complete 
knowledge of its environment, and must thus observe its surroundings in order to 
acquire knowledge: let 17 be a finite set of observations that the agent can make 
of the environment. We introduce an observation function O : S' x A — >■ 77(17) 
that defines a probability distribution over the set of observations; this function 
represents what observations an agent can make resulting from performing an 
action a G A in a state s G S. The agent receives rewards for performing actions 
in certain states: this is represented by a reward function 7? : S x A — >■ 7R. Finally, 
a state transition function t : S x A ^ II (S) defines a probability distribution 
over states resulting from performing an action in a state - this enables us to 
model non-deterministic actions. 

Having defined these sets, we solve a pomdp by computing an optimal policy: 
an assignment of an action to each possible belief state such that the expected 
sum of rewards gained along the possible trajectories in the pomdp is a max- 
imum. Optimal policies can be computed by applying dynamic programming 
methods to the pomdp, based on backwards induction; value iteration and pol- 
icy iteration are the most well known algorithms to solve pomdps [1]. A major 
drawback of applying pomdps is that these kinds of algorithms tend to be highly 
intractable; we later return to the issue of computational complexity as it relates 
to our model. 
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Intention Reconsideration as a pomdp 



We regard the bdi as a domain dependent object level reasoner, concerned di- 
rectly with performing the best action for each possible situation; the pomdp 
framework is then used as a domain independent meta level reasoning compo- 
nent, which lets the agent reconsider its intentions effectively. We define a meta 
level BDi-POMDP as a tuple {S, A, f2,0, R,t) . We have explained above that a 
state s G S' in this model denotes an internal state of the agent, containing a 
belief part and intention part. As intention reconsideration is mainly concerned 
with states, actions and rewards, we leave the implementation of observations 
17, the observation function O and the state transition function t to the designer 
for now. 

Since the pomdp is used to model intention reconsideration, we are merely 
concerned with two possible meta level actions: the agent either performs an 
object level action (act) or the agent deliberates (del). The possible actions 
A = {act, del} correspond to the agent either acting (act) or deliberating (del). 
Because the optimality criterion of policies depends on the reward structure of 
the POMDP, we define the rewards for action act and deliberation del in state 
s G S' as follows: 



R{s, a) 



W(s„t) if a = act 
W(s) if a = del 



where Sint G S refers to the state the agent intends to be in while currently 
being in state s. Imagine a robot that has just picked up an item which has to 
be delivered at some location. The agent has adopted the intention to deliver 
the item, i.e., to travel to that location and to drop off the item. The reward for 
deliberation is the worth of the agent’s current state (e.g., 0) whereas the reward 
for action is the worth of the intented state (e.g., 10) for having delivered the 
item. The robot consequently acts, which brings it closer to its “correct” inten- 
tions. Intentions are correct in case the agent does not waste effort while acting 
upon them. An agent wastes effort if it is deliberating over its intentions unnec- 
essarily. If an agent does not deliberate when that would have been necessary, 
the agent has wrong intentions. The reward for acting is thus the worth of the 
state that the agent intends to reach, whereas the reward for deliberation is the 
worth of the state as it currently is. 

This structure of reward agrees with the intuition that the agent eventually 
receives a reward if it has correct intentions, it receives no reward if it has wrong 
intentions, and it receives no direct reward for deliberation. With respect to this 
last intuition, however, we must mention that the “real” reward for deliberation 
is indirectly defined, by the very nature of pomdps, as the expected worth of 
future states in which the agent has correct intentions. As intentions resist recon- 
sideration [2], the agent prefers action over deliberation and the implementation 
of the reward structure should thus favour action if the rewards are equivalent. 

For illustrative purposes, consider the simple deterministic MDP in Figure 1. 
This Figure shows a 5 x 1 gridworld, in which an agent can move either right or 
left or stay at its current location. The agent’s current location is indicated with 




Reasoning about Intentions in Uncertain Domains 



89 



[del] s1 




Fig. 1. A 5 X 1 gridworld example which illustrates the definition of rewards in a bdi- 
POMDP. Rewards, being either 0 or 10, are indicated per location. With each state we 
have indicated the expected reward for executing a physical action and for deliberation; 
the best meta action to execute is indicated in square brackets. 



a square and the location it intends to travel to is denoted by a circle. Assume 
the agent is currently in state si: its location is cell 4 and it intends to visit cell 
1. Action will get the agent closer to cell 1: it executes a move left action which 
results in state S2 . Deliberation results in dropping the intention to travel to cell 
1, and adopting the intention to travel to cell 5 instead; this results in state S3. 
Obviously, deliberation is the best meta action here and the expected rewards 
for the meta actions in si reflect this: the expected reward for deliberation is 
higher than the one for action. In all other states, these expected rewards are 
equivalent, which means that the agent acts in all other states. 

Solving a bdi-pomdp means obtaining an optimal intention reconsideration 
policy: at any possible state the agent might And itself in, this policy tells the 
agent either to act or to deliberate. The main contribution of our work is that 
our approach gives a well-founded means of establishing a domain dependent 
optimal reconsideration strategy. Thus the agent is programmed with a domain 
independent strategy, which it uses to compute a domain dependent strategy 
off-line, and then executes it on-line. Until now, empirical research on meta level 
reasoning aimed at efficient intention reconsideration has, to the best of our 
knowledge, involved hardwiring agents with domain dependent strategies. 

It is important that deciding whether to reconsider intentions or not is com- 
putationally cheap compared to the deliberation process itself [10]; otherwise 
it is just as efficient to deliberate at any possible moment. Using a pomdp to 
determine the reconsideration policy satisfies this criterion, since it clearly dis- 
tinguishes between design time computation, i.e., computing the policy, and run 
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time computation, i.e., executing the policy. We recognise that the design time 
problem of computing a policy is very hard; this problem corresponds with the 
general problem of solving pomdps and we do not attempt to solve this problem 
in this paper. However, the computation that concerns us most is the run time 
computation, and in our model this merely boils down to looking up the current 
state and executing the action assigned to that state, i.e., either to act or to 
deliberate. This is a computationally cheap operation and is therefore suitable 
for run time execution. 



4 Experimental Results 

In this section, we apply our model in the Tile world testbed [5], and show that 
the model yields better results than were obtained in previous investigations of 
intention reconsideration in this testbed^. 

The Tileworld [5] is a grid environment on which there are agents and 
holes. An agent can move up, down, left, right and diagonally. Holes have to 
be visited by the agent in order for it to gain rewards. The Tileworld starts 
in some randomly generated world state and changes over time with the ap- 
pearance and disappearance of holes according to some fixed distributions. An 
agent moves about the grid one step at a time^. The experiments are based on 
the methodology described in [8] . (We repeated the experiments described in [8] 
to ensure that our results were consistent; these experiments yielded identical 
results, which are omitted here for reasons of space.) 

The Tileworld testbed is easily represented in our model. Let L denote 
the set of locations, i.e., L = {i : 1 < i < n} represents the mutually disjoint 
locations, where n denotes the size of the grid. A proposition pi then denotes 
the presence {pi = 1) or absence (pi = 0) of a hole at location i. An intention 
value corresponds to the reward received by the agent for reaching a hole, and 
an intention cost is the distance between the current location of the agent and 
the location that the agent intends to reach. An environment state is a pair 
{{pi, . . . ,Pra}, to), where {pi, . . . ,p„} are the propositions representing the holes 
in the grid, and to S L is the current location of the agent. 

Combining the 2" x n possible environment states with n possible intentions 
means that, adopting explicit state descriptions, the number of states is 2” x n^, 
where n denotes the number of locations. Computations on a state space of such 
size is impractical, even for small n. In order to render the necessary compu- 
tations feasible, we abstracted the Tileworld state space. In the Tileworld 
domain, we abstract the state space by letting an environment state e be a pair 
(PhPi), where p\ refers to the location of the hole which is currently closest to 

Whereas until now we have discussed non-deterministic pomdps, in the experimental 
section we restrict our attention to deterministic mdps in order to compare our new 
results with previous results. 

® Although it may be argued that the Tileworld is simplistic, it is a well-recognised 
testbed for evaluating situated agents. Because of the dynamic nature of the Tile- 
world, the testbed scales up to difficult and unsolvable problems. 
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the agent, and p 2 refers to the current location of the agent. Then an agent’s in- 
ternal state is {{pi,P 2 ), {*i}) where ii refers to the hole which the agent intends 
to visit. This abstraction means that the size of the state space is now reduced 
to v? . However, the agent now has to figure out at run time what is the closest 
hole in order to match its current situation to a state in the Tileworld state 
space. This computation can be done in time 0{n), by simply checking whether 
every cell is occupied by a hole or not. Because the main purpose of this example 
is merely to illustrate that our model is viable, we are currently not concerned 
with this increase in run time computation. 

In [8], the performance of a range of intention reconsideration policies was 
investigated in environments of differing structure. Environments were varied 
by changing the degree of dynamism (7), observability (referred to by [8] as 
accessibility), and determinism. Dynamism is denoted by an integer in the range 
1 to 80, representing the ratio between the world clock rate and the agent clock 
rate. If 7 = I, then the world executes one cycle for every cycle executed by the 
agent and the agent’s information is guaranteed to be up to date; if 7 > 1 then 
the information the agent has about its environment may not necessarily be up 
to date. (In the experiments in this paper we assume the environment is fully 
observable and deterministic.) The planning cost p was varied, representing the 
time cost of planning, i.e., the number of time-steps required to form a plan, and 
took values 0, 1, 2, and 4. 

Three dependent variables were measured: effectiveness, commitment, and 
cost of acting. The effectiveness e of an agent is the ratio of the actual score 
achieved by the agent to the score that could in principle have been achieved. 
An agent’s commitment {(3) is expressed as how many actions of a plan are 
executed before the agent replans. The agent’s commitment to a plan with length 
n is {k — l)/(u — 1), where k is the number of executed actions. Observe that 
commitment defines a spectrum from a cautious agent {(3 = 0, because fc = 1) 
to a bold one (/? = 1, because k = n). The cost of acting is the total number of 
actions the agent executes. 



Solving the Tileworld mdp off-line. To summarise, the Tileworld mdp 
that we have to solve off-line consists of the following parts. As described above, 
the state space S contains all possible internal states of the agent. Each state 
s G S' is a tuple {{pi,P 2 ),{ii}), where p\ refers to hole that is currently closest to 
the agent, p 2 refers to the current location of the agent, and i\ denotes the hole 
which the agent intends to visit. The set of actions is A = {act, del}. (Note that 
the set of physical actions is Ac = {stay,n,ne,e, se, se, sw,w,nw}, but that is 
not of concern to us while specifying the Tileworld mdp.) Since we assume 
full observability, the set of observations is 17 = S'. Finally, state transitions 
are defined as the deterministic outcomes of executing an action a G A. As 
the agent deliberates in state s resulting in state s' (i.e., T{s,del) = s'), then 
Bels = Bels', but possibly Ints Ints'; as the agent acts (i.e., T{s,act) = s"), 
then Intg = Intgn, but possibly Belg Belg". Thus deliberation means that 
the intention part of the agent’s internal state possibly changes, and action 
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Effectiveness of a bdi-pomdp agent 




Fig. 2. Overall effectiveness of a bdi-pomdp agent. Effectiveness is measured as the 
result of a varying degree of dynamism of the world. The four curves show the effec- 
tiveness at a planning cost (denoted by p) from 0 to 4. The two other curves show the 
effectiveness at p = 1 and p = 2 of Kinny and Georgeff’s best reconsideration strategy 
(from [3]). 



means that the belief part of the agent’s internal state possibly changes (both 
ceteris paribus with respect to the other part of the internal state). Although 
solving MBPS in general is computationally hard, we have shown above that by 
appropriate abstraction of the Tileworld state space, the computations for 
our Tileworld mdp become feasible. 



Results. The experiments resulted in the graphs shown in Figures 2, 3(A) and 
3(B). In every graph, the environment’s dynamism and the agent’s planning cost 
p (for values 0, 1, 2 and 4) are varied. In Figure 2, the overall effectiveness of 
the agent is plotted. In Figure 3(A) we plotted the agent’s commitment level® 
and in Figure 3(B) the cost of acting. 



Analysis. The most important observation we make from these experiments 
is that the results as presented in Figure 2 are overall better than results as 
obtained in previous investigations into the effectiveness of reconsideration (as 

® The collected data was smoothed using a Bezier curve in order to get these commit- 
ment graphs, because the commitment data showed heavy variation resulting from 
the way dynamism is implemented. Dynamism represents the acting ratio between 
the world and the agent; this ratio oscillates with the random distribution for hole 
appearances, on which the commitment level depends. 
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(A) (B) 



Fig. 3. (A) Average commitment level for a bdi-pomdp agent. The commitment level 
is plotted as a function of the dynamism of the world with planning cost (denoted by 
p) of 0, 1, 2 and 4. (B) Average cost of acting for a bdi-pomdp agent. The cost of 
acting - the number of time steps that the agent moves - is plotted as a function of 
the dynamism of the world with planning cost (denoted by p) of 0, 1,2 and 4. 



elaborated below). Our explanation for this observation is that solving the bdi- 
POMDP for our Tileworld domain delivers an optimal domain dependent re- 
consideration strategy: the optimal bdi-pomdp policy lets the agent deliberate 
when a hole appears that is closer than the intended hole (but not on the path to 
the intended hole), and when the intended hole disappears. Kinny and Georgeff 
[3] concluded that it is best for an agent to reconsider when a closer hole appears 
or when the intended hole disappears. Besides this observation, we see in Figure 
3(A) that our bdi-pomdp agent is able to determine its plan commitment at run 
time, depending on the state of the environment. This ability contributes to in- 
creasing the agent’s level of autonomy, since it pushes the choice of commitment 
level from design time to run time. 

Our experimental results confirm the results obtained in previous investi- 
gations on selecting an intention reconsideration strategy [3, 8, 9]: the agent’s 
effectiveness and level of commitment both decrease as the dynamism or plan- 
ning cost increases, and the cost of acting decreases as the dynamism or planning 
cost increases. 

Whereas the focus of previous research was on investigating the effective- 
ness of fixed strategies in different environments, the aim of the investigation in 
this paper is to illustrate the applicability of our bdi-pomdp model. Kinny and 
Georgeff [3] have included empirical results for an agent that reconsiders based 
on the occurrence of certain events in the environment (see [3, p87] Figures 8 
and 9 for p = 2 and p = 1, respectively). Their conclusion from these results 
was that it is best for an agent to reconsider when the agent observes that ei- 
ther a closer hole appears or the intended hole disappears, as mentioned above. 
We implemented this strategy for the agent in our testbed and yielded identical 
results. We observed that an agent using our bdi-pomdp model performs better 
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than the agent using the mentioned fixed strategy with a realistic planning cost 
{p > 2). Having compared our results to the results of fixed strategies, we con- 
clude that, as mentioned above, in effect, our agent indeed adopts the strategy 
that delivers maximum effectiveness. 

In the context of flexible strategies, we compare our results to the results 
from [9], where the effectiveness of an alternative flexible strategy, based on 
discrete deliberation scheduling [7], is explored. The main conclusion we draw 
from comparing the results from the two strategies is that the empirical outcomes 
are analogous. Comparing the graphs from Figure 2 to the result graphs from [9], 
we observe that the agent’s effectiveness is generally higher for our bdi-pomdp 
model; when we compare the graphs from Figure 3(B) to the cost of acting 
graphs from [9], we see that the cost of acting is lower overall in the discrete 
deliberation model. However, in our bdi-pomdp model, the level of commitment 
is more constant, since the bdi-pomdp agent’s decision mechanism depends less 
on predictions of appearances and disappearances of holes. 

5 Discussion 

In this paper we presented a formalisation of the intention reconsideration pro- 
cess in BDI agents based on the theory of pomdp planning. The motivation for 
the formalism is that bdi agents in real world application domains have to re- 
consider their intentions efficiently in order to be as effective as possible. It is 
important that reconsideration happens autonomously, since an agent’s commit- 
ment to its tasks changes depending on how its environment changes. The main 
contribution of our model is that we deliver a met a level and domain independent 
framework capable of producing optimal reconsideration policies in a variety of 
domains. The model applies pomdp planning to agents; in this paper we do 
not investigate how intentions can contribute to efficiently solving pomdps, but 
regard such an investigation as important further work. 

In the work presented, we show that the environmental properties of dy- 
namism, observability and determinism are crucial for an agent’s rate of inten- 
tion reconsideration. Our formalism takes all mentioned environmental proper- 
ties into account, and they form the basis of the decision mechanism of the bdi 
agent. A distinctive component in the bdi agent decides whether to reconsider 
or not, and we use the pomdp framework to determine an optimal reconsidera- 
tion strategy that is used for implementing this component. We leave open the 
question whether a similar result can be achieved by the construction of com- 
plex sequential and conditional plans, since this defies the very nature of the bdi 
concept. A bdi agent is concerned with the management of simple plans over 
time, thus its intelligence is located in its meta-reasoning capabilities and not in 
its planning capabilities. 

We have shown that an agent which is designed according to our formalism, 
is able to dynamically change its commitment to plans at run time, based on 
the current state of the environment. (In the experiments that are described in 
this paper, we assumed the environment to be fully observable and completely 
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accessible, in order to compare our results with previous results.) This agent 
achieves better performance than existing planning frameworks, in which the 
level of plan commitment is imposed upon the agent at design time. The bdi- 
POMDP model has the advantage over the deliberation scheduling model (as 
used in [9]) that it computes a substantial part of the reconsideration strategy 
at design time, whereas all computations for deliberation scheduling are at run 
time. In contrast, the deliberation scheduling model is supposedly more flexible 
in changing the reconsideration strategy at runtime. 
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Abstract. The goal of decision-theoretic troubleshooting is to find a se- 
quence of actions that minimizes the expected cost of repair of a device. 
If the device is complex then it is convenient to create several Bayesian 
Networks, each designed to solve a particular problem. At the beginning 
of a troubleshooting process, it is often necessary to help the user to se- 
lect the proper model. Complications arise if the user is able to give only 
a vague description of the problem. In such a case we need to work si- 
multaneously with many troubleshooting models. In this paper we show 
how models that were originally designed as independent models can be 
used together while memory space and computational time are kept low. 
We allow models to be overlapping, i.e., two or more models may contain 
equivalent troubleshooting steps and/or equivalent problem causes (de- 
vice faults). We propose a troubleshooting procedure that can be used 
with many simultaneous models at once. The key that enables us to join 
the models together is the single fault assumption, which means that 
there is only one fault causing a device malfunction at a time. 



1 SACSO Troubleshooting Approach 

We start with a review of the SACSO troubleshooting approach proposed for 
troubleshooting with a single model. The approach was implemented in the HP 
BATS troubleshooter [2] . The goal of a troubleshooting task is to find and remove 
the cause of a device malfunction. In case of a complex device, such as for example 
a laser printer, it is convenient to create several models each designed to solve a 
particular problem. All original troubleshooting models Mi, i = 1,2, N have 
similar structure. Each model Mi describes relations between a set of repair 
actions Ai, a set of observations Oi, and a set of causes C, that can be solved 
within model Mi. Repair actions are actions that can directly solve the problem, 
while observations can not solve the problem directly, but may help identify the 
problem cause. 

It is assumed that only one cause from C, can be the cause of a device 
malfunction at a time. It is often referred to as the single fault assumption. 
This assumption is reasonable when troubleshooting printing systems and similar 
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man-made devices. Therefore, each cause can be represented as a state c, € C* of 
a single cause variable CM,. The state space of each variable CM, is extended 
by an additional state n.a. This state corresponds to the case when the true 
cause of the problem is not addressed in model M,. In other words, CM, = n.a. 
corresponds to the situation when model Mi does not solve the problem. 

It is also assumed that actions and observations are independent given the 
cause. This assumption implies that if the cause of the problem is known then 
neither the fact that an action failed to solve the problem nor an outcome of a 
made observation affect the probability of any other action solving the problem. 
In Fig. 1 an example of two SACSO troubleshooting models is shown. 



Cl 





Remark 1. In the HP BATS troubleshooter both repair actions and observations 
are included. To keep the exposition simple we stick to actions only. However, 
the approach of this paper is suitable also for models containing both repair 
actions and observations. 

For every action A and for each cause c £ Ci included in the model Mi the 
conditional probabilities Pi{A = yes \ CMi = c) are provided by a domain 
expert and it is assumed that Pi{A = yes \ CMi = n.a.) = 0. We say that an 
action A can solve a cause c in a model Mi if Pi{A = yes \ CMi = c) >0. The 
set of causes that can be solved by an action A in a model Mi is denoted by 
pai{A), pai{A) C Cj. The set of actions that can solve a cause c in a model Mi 
is denoted by chi(c), chi(c) C A*. 

Each action A has associated a cost cost{A). It may correspond to the time 
needed to perform action A, money spent when performing action A, a combi- 
nation of time and money, or another criteria. The troubleshooting task is to 
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find a sequence of actions that minimizes the expected cost of repair, i.e., the 
expected total cost of all actions performed until the problem is solved. 

It has been shown that in the case of actions A £ A with disjoint pai{A) it 
suffices to order them decreasingly according to the ratio Pi{A = yes) / cost{A) 
(see [4]). However, in [7] it was shown that in case of overlapping pai{A) the 
troubleshooting task is 77P-hard.^ The heuristic algorithm that is the essence 
of the SACSO troubleshooting procedure [2,5] consists of three basic steps that 
are repeated until the problem is solved: 



1. Select a repair action of the highest efficiency 



eff{A) 



Pj{A = yes I h) 
cost{A) 



where h denotes evidence introduced by the troubleshooting history. This 
corresponds to the evidence that all performed actions failed to solve the 
problem. 

2. Perform the chosen action and observe the result. 

3. If the performed action did not solve the problem then enter the outcome of 
the troubleshooting step into the model and update the model. 



The reader interested in details or in other approximate methods used to find a 
troubleshooting strategy is referred to [2, 5, 7]. 

In the rest of this paper we discuss how troubleshooting with simultane- 
ous models can be performed. In Sec. 2 we describe how single troubleshooting 
models can be joined together. In Sec. 3 we apply the SACSO troubleshooting 
procedure to the troubleshooting with many simultaneous models. In Sec. 4 we 
propose how probabilities of causes can be initiated when only a vague descrip- 
tion of a problem is provided. 



2 Simultaneous Models 

The current approach to multiple models, implemented in the HP BATS trou- 
bleshooter [2,5], is, first, to use the authoring tool [6]. This creates dozens of 
models, with each model related to a particular problem. When troubleshooting 
with the HP BATS troubleshooter, the user selects one model with the help of 
a selection tree. Then she performs troubleshooting with the chosen model as 
described above. The problem of this approach is that as the user may not know 
what the problem exactly is, it may not be clear which model to select. Therefore 
we need a way to work with several models simultaneously. 

In Fig. 2 a scheme for troubleshooting with simultaneous models is displayed. 
The troubleshooter consists of troubleshooting models Mi, M 2 , . . . ,Mn and a 
supermodel. The supermodel reflects dependencies between causes and problems. 
It uses the user’s problem description and answers to certain questions that may 
help identify the problem, it communicates with the troubleshooting models, 

^ If VA 6 A : \pai (A) I < 2 then the complexity is still undecided. 
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User’s problem 
description 



Next question Q* to ask 
Answer to the question 



Supermodel: 

- problems 
- - questions 

Algorithm that selects 

- next question Q* or 

- next action A* 6 

Next action A* to perform 
Outcome of the action 



Troubleshooting 
models 



/ A 

Model Ml 

- causes Ci 

- actions Ai 

V J 



f \ 

Model M 2 

- causes C 2 

- actions A 2 

V J 



/ \ 

Model Mjv 

- causes Cn 

- actions An 

V J 



Fig. 2. A basic scheme for troubleshooting with simultaneous models 



and it realizes the troubleshooting algorithm, i.e., it selects a best next action 
to perform and updates the troubleshooting models by the observed outcomes. 

It will turn out that under assumptions discussed in this section we can join 
together all SACSO models and create a single Bayesian network which we will 
call the joint model. Then we can simply apply the algorithm used for trou- 
bleshooting with a single SACSO model to troubleshooting with simultaneous 
models. The supermodel is discussed in detail in Sec. 4. 

Generally, there can be identical problem causes and actions that appear in 
more than one SACSO model. For example, “Media out of specification” 
can be a cause of “Paper Jam”, “Spots”, or “Temporary problem solvable 
by cycling power”. For each of these three problems a single SACSO model is 
designed. If it is possible, we identify equivalent causes across all models auto- 
matically. Otherwise we need to consult a domain expert. We assign the same 
index to equivalent causes. Similarly, an automatic program or a domain ex- 
pert should identify equivalent troubleshooting actions across all models. Again, 
we will assign the same index to equivalent actions. The joint model contains 
all problem causes and all troubleshooting actions from the individual SACSO 
models, i.e., 

N N 

C = [J Cj \ n.a. and A = [^ Ai . 

A question arises whether we can declare two causes c £ Ci,c' £ Cj identical 
if chi{c) ^ chj{c'). For example in Fig. 1, cause C 3 of model M\ is solved by 
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action Ai but action Ai is not present in model M2 where cause C3 is present. A 
reason for this situation can be that an expert did not want to include the action 
that is part of the first model in the context of the second model, e.g., because 
the model would be too complex for the audience. We allow such situations and 
treat two identical causes solved by different actions as one cause solved by all 
actions that can solve that cause in any SACSO model. It means that for each 
cause c e C the set of its children in the joint model 

N 

ch(c) = (J chi(c) . 

A similar question is whether two actions A £ Mi, A' £ Mj can be declared 
identical if pa,(A) ^ paj(A'). For example, in Fig. 1 , action A2 can solve cause C4 
in model M2 but not in model M\ since this model does not contain this cause. 
This situation will appear quite naturally, since no expert would like to list all 
possible causes that can be solved by an action no matter what the problem is. 
We treat two identical actions solving different causes as one action solving all 
causes solved in any SACSO model. It means that for each action A £ A the set 
of its parents in the joint model 



N 

pa{A) = [jpai{A) . 

Let us use the example of the two models from Fig. 1 to explain how to 
define the conditional distributions attached to actions present in more than one 
model. Action A2 solves cause C2 with probability Pi{A2 = yes \ CM\ = C2) 
and C3 with probability ^1(^2 = yes \ CM\ = C3) in model M\. In model 
M2 it solves cause C3 with probability ^2(^2 = yes \ CM2 = C3) and cause 
C4 with probability ^2(^2 = yes \ CM2 = C4). It is natural to have the same 
conditional probabilities in the joint model. A question is what to do if for 
example ^1(^2 = yes \ CM\ = c^) ^ ^*2(^2 = yes \ CM2 =03). We believe 
that the only reason for the difference can be that it is difficult for an expert to be 
100 % consistent. Therefore we either ask an expert to resolve this inconsistency 
or we simply take the average of the two numbers. In the rest of this paper we 
assume that for any two models M, ^ Mj containing an action A and a cause 
c £ pai (A) and c £ paj (A) it holds that 

Pi{A I CMi = c)= Pj{A I CMj = c) . 

In the individual SACSO models the basic assumption used is the single fault 
assumption. It is reasonable to keep this assumption in the joint model as well 
since it is a characteristics of the device and it can not be influenced by the fact 
that a user does not know what the problem is. The single fault assumption is 
encoded in the joint model using the node CA that has all possible causes as its 
states. We also keep the conditional independence of actions given the cause in 
the joint model. 
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The arguments provided above leads to a unique way of combining the 
SACSO models to a joint model. The joint model has a Naive Bayes structure, 
whose conditional probability distributions are given by the following definition. 

Definition 1. Let m(c A) define a funetion sueh that for any eause c G C 
and any aetion A G A 

( ^ 41 - / * A£ Mi A c G pafiA), 

^ \ 0 otherwise. 

In the joint model the eonditional probability of an aetion A G A given a eause 
c G C is defined as 



P{A = yes I CA 




0 

yes I CMm 



if m{c ^ A) = 0 , 
c) if TO = to(c A) . 



In Fig. 3 we give an example of a joint model, which is composed from two 
SACSO models of Fig. 1. 



Cl 

C2 yes/no 




Fig. 3. The two SACSO troubleshooting models from Fig. 1 joined together. 



3 A Troubleshooting Procedure 

In Sec. 1 we sketched the SACSO troubleshooting procedure for an individual 
SACSO model. In Sec. 2 we described how the joint model is created from the 
SACSO models. In this section we propose an algorithm for troubleshooting with 
simultaneous models that is an application of the SACSO troubleshooting pro- 
cedure to the joint model. We show that we need not even create the joint model 
since all information can be stored in and read from the SACSO models. We also 
demonstrate that a simple naive approach to troubleshooting with simultaneous 
models ignoring the fact that different models may contain identical causes and 
actions may provide erroneous results. 

We require Pi{A = yes \ CMi = c) being equivalent for all models including 
both A and c and with c G pafiA). Therefore it does not matter which of 
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these models is chosen by the function m{c ^ A). In fact, also after inserting 
a troubleshooting history h as evidence the function m{c ^ A) can be used to 
select a model for reading P{A \ CA = c,h). 

Lemma 1. For the conditional probability P(A \ CA = c,h) of A given a cause 
c e pa(A) and a troubleshooting history h in the joint model it holds that 

P{A \CA = c,h)= P^(c^A){^ I = c) • 

Proof. By the conditional independence of actions given the cause in the joint 
model we get 



P{A \CA = c,h) = P{A \CA = c) . 



From Definition 1 we read that 

P{A \CA = c) = Pm(c^A){A I = c) , 

which proves the assertion of the Lemma. □ 

The following lemma underlies the simple propagation scheme for Naive 
Bayes models. It will be used in the troubleshooting procedure. 

Lemma 2. Consider the joint model with a troubleshooting history h. Let L = 
\C\. Then the probabilities of causes c G C of the joint model can be updated in 
the light of a new evidence e = {A = no) by the following formula 



P{CA = c\e,h)-.= 



P{e \ CA = c)- P{CA = c\h) 



P(e I h) 

where P(e \ h) = P{e \ CA = q) • P{CA = ci\h). 

Proof. Using Bayes’ rule we can write 

Since actions are independent given a cause in the joint model we get 
P{e, CA = c\h)=P{e\CA = c)- P{CA = c\h) . 

The probability P{e \ h) can be computed as 

L 

P{e \h) = Y,P(e\CA = Cl) ■ P{CA = ci\h) . 

e^i 



( 1 ) 

(2) 

(3) 



Substituting P{e \ h) from (3) and P{e, CA = c \ h) from (2) into (1) we get the 
assertion of the lemma. □ 
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Table 1. A troubleshooting procedure 



1. Initiate h, A{h) := A, and P{CA = c),c 6 C. 

2. For each action A 6 A{h) compute 

P{A = yes \h):= ^ P{A = yes \CA = c)- P{CA = c \ h) 

c£pa(A) 

I '■) 

cost (A) 

3. Perform action A* := arg max^g_ 4 (;i) e.ff{A \ h). 

4. If A* solves the problem then quit 

otherwise set e := {A* = no} and continue with step 5. 

5. For each c 6 pa{A*) compute 

/(c, e\h) := P{e \CA = c)- P{CA = c\h) . 

6. For each c 6 C \ pa{A*) set /(c, e \ h) := P{CA = c | h). 

7. Compute the normalization constant 

P(e I h) := ^ /(c,e | h) . 
cec 

8. Normalize, i.e., for each c ^ C 

P(CA = r.t,,k) . 

Update h := h\J{A* = no} and A{h) := A{h) \ A* . If A{h) = 0 then quit. 
Otherwise, go to step 2. 



Using Lemma 2 we establish a fast updating scheme. The full troubleshoot- 
ing procedure based on this updating scheme is described in Table 1. We note 
that the procedure corresponds to the SACSO troubleshooting procedure [2, 
5] applied to the joint model. This procedure does not provide optimal trou- 
bleshooting strategies. However, when troubleshooting printers, it was shown 
that the strategies provided by the SACSO troubleshooting procedure are very 
close to optimal strategies [2] . 

Proposition 1. The troubleshooting procedure described in Table 1 can be per- 
formed using the original SACSO models only, i.e., we need not create the joint 
model. 

Proof. Observe that P{A = yes \ CA = c), needed in step 2, can be read from 
model Mi,i = m{c A) as Pi{A = yes \ CMi = c) and P{e \ CA = c), needed 
in step 5, can be read from model Mj,j = m{c A*) as Pj{A* = no \ CMj = c) 
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(Lemma 1). We only need to store repeatedly updated probabilities of causes 
P{CA = Cl \ h) somewhere. A convenient location can be the variables CM, 
in the models = 1,2, . . . ,N , where, having new evidence e, we update the 
probability distribution Pi{CMi = ci \ h). Thus we keep all models updated 
and we can read the probability P{CA = a \h) from any model containing the 
cause Q. □ 

The complexity of the proposed troubleshooting procedure if used to provide 
a full troubleshooting sequence is only x |C|), where |^| is the number 

of different actions and \C\ is the number of different causes over all models. 
Furthermore, we may substantially speed up the computations if we check for 
causes having P{CA = c \ h) equal or close to zero and disqualify such causes 
from further computations. Consequently we need not work with a great many 
models at the same time, since all models M, having Pi{CMi = n.a. \ h) equal 
or close to one can be disqualified from further computations. 

In the following remark we show that if the probability of an action solving 
the problem was computed as the total sum over all SACSO models we would 
get erroneous results. 

Remark 2. Note that each cause c € pa{A) is included in the sum of step 2 only 
once, which is generally different from the sum 

V P (A-vps\ h) - V V Pji^ = yes\CMj =c) 

Pj(A yes\h) 2^ 2^ -PJCM. =c\h) 

je{l,2,...N} je{l,2,...N} cepai(A) 

where each cause appear as many times as is the number of models it is contained 
in. Therefore, if this formula was used to estimate P{A = yes \ h), it would dis- 
proportionally favour actions solving causes that are contained in more models. 

4 Initial probabilities of causes 

The reader has probably realized that we have not discussed how the probabilities 
of the causes are defined when creating the joint model. Since this task requires 
a deeper discussion we have left it for an independent section. 

In order to be able to reflect dependencies between causes and problems we 
create a Bayesian network model, which we call the supermodel. This model 
can use user’s problem description and answers to certain questions that may 
help identify the problem. The problem variable PR has all possible problems 
pri, pr 2 , ■ ■ ■ ,prK as its states. The supermodel is connected to the joint model 
through variable CA. Expert knowledge of dependence between problems and 
causes is encoded in the conditional probability distribution 

P{CA = ci\PR = prk), k = l,2,...,K, 1=1,2,. ..,L. 

This means that for each problem pr^ the expert distributes the probability mass 
between the causes. 
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Remark 3. When building a single SACSO model Mi a domain expert provided 
the initial probabilities of causes assuming the problem being solved in that 
model, i.e. she provided the conditional probabilities Pi{CMi = c \ prohlerrii), 
where problenii is the problem solved in model Mi (see [6] for details). If there is a 
one-to-one correspondence between models and problems, i.e., prohlerrik ~ pvk, 
for A: = and K = N, then we can use the initial probabilities of 

causes from the SACSO models to define the conditional probability distribution 
P{CA I PR), so that for k = 1,2, . . . , K 

P{CA = ci\PR = prk) = PkiCMk = ci) if q G C* , 

P{CA = Cl I PR = prk) = 0 otherwise. 

When starting with a particular troubleshooting process the model should 
reflect all the prior knowledge. Prior knowledge is summarized by means of the 
prior probability distribution of the variable PR: 

P(PR = pri), P(PR=pr 2 ), ■■■, P(PR = ptr)- 

For example, these probabilities can be a result of a text mining task performed 
on the user’s description of problem and observations the user made. 

At the beginning of a troubleshooting process or during the process the user 
can be asked certain questions Qi,Q 2 , ■ ■ ■ , Qj which may help identify the prob- 
lem. For each question Qj and for each problem prk, a conditional probability 
distribution P{Qj \ PR = prk) is given. It is assumed that question Qi is in- 
dependent of Qj given PR for any i ^ j,iQ G - An example of a 

supermodel connected to a joint model is given in Fig. 4. 



Cl 

C 2 yes/no 




Fig. 4. The superaiodel and the joint model joined together 



In order to select the most informative question given a current probability 
distribution over problems P{PR \ h) we can use standard methods for value of 
information [1]. An overview of these methods can be found, e.g., in [3]. Methods 
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used to select questions in the full version of the SACSO algorithm [2,5], i.e., 
the probability of a question identifying the problem and the expected cost of 
observation, are suitable here as well. These methods can be also used to decide 
whether we will ask the user more questions. In Table 2 the initialization of 
probabilities of causes is formally described. 



Table 2. Initialization of the probabilities of causes. 



1. For each problem pr* 6 VTi- and for each cause ci 6 C, ask a domain 
expert to provide you P{CA' = ct \ PR = pru)- 

2. Ask the domain expert to propose set of questions Q. For all answers qj 
to all questions Qj 6 Q and for all possible problems pr* 6 VR. ask the 
domain expert to provide P{Qj = qj \ PR = pru)- Use the knowledge 
acquisition methodology described in [6]. 



3. Set h := 0. 

4. Derive P{PR = prk), k 6 {1,2,..., K~\ from the user’s problem descrip- 
tion. 

5. Ask the user the most informative question Qj 6 Q given the history h. 
Record the answer qj . 

6. For each pr*, k € {1,2,..., K} compute 



P{prk I h U {Qj 



P{Qj = qj I PR = prk) ■ Pjprk I h) 
Lf=i PiQi = I PP = P^k) ■ P{prk I h) 



set h := hU {Qj = qj}- 

7. Use a criteria to decide whether you will ask the user more questions. If 
yes, go to step 5, else initialize P{CA = ci \ h) for all c 6 C as 



K 

P{CA = c I h) = ^ P{CA = c\PR = prk) ■ P{PR = prk \ h) 

k=l 



and initiate Pi(CMi = c \ h) = P{CA = c \ h) in the individual SACSO 
models Mi, i = 1,2, N. 



If we prefer to allow general questions from Q to be asked during a trou- 
bleshooting session then we need to communicate the probability distribution 
P{CA I h) between the SACSO models and the supermodel. When necessary we 
can return to the supermodel and update it, i.e. in the supermodel we propagate 



P{CA = c\h)= PmiCMm = c\h), for all c G C , 
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where m is the index of any model containing cause c. Then we use the algo- 
rithm of Table 2 starting with step 5 and using the updated probabilities to 
decide which questions we ask. Finally, when we decide to return back to the 
troubleshooting procedure we simply replace the values of Pi{CMi = c \ h), 
i = 1,2, . . . ,N hy the values computed in the supermodel as proposed in step 7. 

5 Conclusions 

We have presented a fast method that can be used to combine information from 
thousands of troubleshooting models that may share certain problem causes 
and solution actions. The single fault assumption, allowed us to derive a simple 
scheme for updating probabilities of causes in the models. Thus we were able to 
use the same troubleshooting methods as if we were working with a single joint 
model. 
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Abstract. We present a decision model apt to handle preference rela- 
tions among conditional acts, not necessarily satisfying transitivity and 
sure thing principle. We give also a characterization of preference re- 
lations agreeing with such a model, by means of rationality conditions 
interpretable in terms of betting scheme. 



1 Introduction 

Any expert system, devoted to help in decisions, has as grounds both a model 
for handling partial knowledge and a decision model of reference. On the other 
hand it is clear that any not static model must be based on conditioned objects 
and must possibly work with domains containing only the elements and the 
information of interest. 

In the last years many results in this sense have been obtained for qualitative 
and quantitative function representing uncertainty, but not for decision models. 
A conditional decision model is in fact usually derived from an unconditional one 
(for a classical reference, see [19]). This is a very restrictive view of conditional 
decisions, corresponding trivially to just a modification of the “world” S. It is 
instead essential to regard the events Ai conditioning the acts as “variables” 
or, in other words as uncertain events. This point of view gives to the decision 
maker the possibility to have “a priori” a plan of choice taking into account all 
the possible scenarios of interest, also those considered “infinitely less probable” 
than other ones. Similar aims are at the bottom of [16], but in this paper only 
acts conditioned on the same event can be compared. 

We present a decision model for conditional acts which completely captures 
the instances expressed above. It is based on the concepts of coherent conditional 
probability and its relevant characterization in terms of classes of unconditional 
probabilities Pa (see for instance [2], [5]), representing in fact the different layers 
of degree of belief ([14] section 8 or [7]). For connections between layers and 
Spohn’s theory see [8], in this issue. 

The decision maker using this rational conditional model prefers an act / 
conditioned on an event A to an act g conditioned on an event B {/a, 9b re- 
spectively), taking into account the utility which he attributes to the acts Ja 
and 9b and the degree of belief in the occurrence of A and B, when he supposes 
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the occurrence oi AW B. The preference relation among the conditional acts is 
in fact “locally” represented by a coherent conditional prevision which is the 
product of an utility function and a coherent conditional probability. 

Precisely “locally” means that the strict monotonicity with the preference 
relation is assured only in the possible world of the events of the some prob- 
abilistic layer. For that the sure thing principle can be violated by the strict 
preference, and the transitivity is not assured for the symmetric part of the 
relation (indifference relation). 

We present also conditions of rationality (both for finite and infinite frame- 
work), interpretable in terms of betting scheme. They are necessary and suffi- 
cient conditions for the existence of a rational conditional model agreeing with 
the preference relation. The sketched proof of the characterization Theorem, for 
the finite case, suggests the main steps for implementing an algorithm to actually 
build the conditional decision model, starting from a (possibly parsimonious) set 
of conditional acts and a (not necessarily complete) preference relation. 

2 Coherent Conditional Previsions and Probabilities 

In this section we recall some results related to the concept of coherence both 
for conditional previsions and conditional probabilities. Coherence conditions 
are essentially rules which permit to give a consistent definition of probability 
or prevision, avoiding any requirement of closure of the set of events or random 
quantities, with respect to logical or algebraic operations respectively. 

2.1 Coherent Conditional Previsions 

Given a random quantity X and an event H ^ 0, X\H denotes the random 
quantity conditioned on H, that is the random quantity assuming the same 
values of X, if is true and undetermined when H is false. We denote by Ih 
the indicator function of the event H (that is the random quantity assuming 
values 1 or 0 according to H is true or false, respectively) and by XIh the 
random quantity assuming the same value of X, if H is true and 0 if is false. 

Let /C = {X\H : X G X , H G %} be a set of conditional random quantities 
and P a real function defined on JC. 

It is well known that if /C satisfies: 

a) X is a linear space containing a non-zero constant quantity 

b) "H is an additive set (i.e. closed with respect to finite disjunctions), con- 
taining Ih of any element H gT~L, and XIh, for any X G X and H gT~L, 

then P is a conditional prevision if the following conditions hold: 

1) P(-|iL) is a linear function; 

2) ini{X\H) < TP{X\H) < sup{X\H); 

3) F{Ih,X\H 2) = F{X\Hi A H 2 ) ■ F{Ih, |i?2), VX, H^,H2. 

Definition 1 LetK. = {X\H,X G X,H G H} be an arbitrary set of ^conditional 
random quantities. A function P : 1C ^ F is a coherent conditional prevision if 
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there exists a set 1C = {X\H,X G X',H € %'} with X' satisfying (a), and H' 
satisfying (b), 1C 3 1C, such that there exists a conditional prevision P' : 1C ^ 
R extending P. 

This definition of coherence is equivalent to that known as ”de Finetti- 
coherence” (dF-coherence), expressed in terms of conditional bets” (see [15], 
[20] [13], [17]): 

Definition 2 Let 1C = {X\H : X C X, H C %} he a set of conditional random 
quantities. A real function P on 1C, is a dF-coherent conditional prevision if: 

{CC) for every Xi\Hi, . . . , Xn\Hn G 1C and G P, by putting H = 

we have „ 

sup - P{X,\H,))Ih, > 0. 

2.2 Coherent Conditional Probabilities 

Let C = Q X B° he a set of conditional events E\H . If iC is such that Q is 
a Boolean algebra and C ^ an additive set, B° = B \ {0} then a function 
P : C — >■ R is a conditional probability in the sense of de Finetti [9], Dubins [11], 
Krauss [14], if it satisfies the following conditions: 

(i) P{H\H) = 1, for every H C B° , 

(a) P{-\H) is a (finitely additive) probability on Q for any given H € B° , 

(Hi) P{{E A A)\H) = P{E\H) ■ P{A\{EAH)), for every A C g and E,H, 
EAHCB°. 

Recall that properties (i) and (Hi) are also in the definition by Renyi [18], 
where condition (ii) is replaced by the stronger one of cr-additivity (obviously, 
the two definitions are equivalent if the algebra Q is finite). 

Definition 3 Given an arbitrary set of conditional events C, a real function 
P(-|-) on C is a coherent conditional probability assessment if, there exists C D 
C , C = g X B°, with g Boolean algebra and B additive set, such that there exists 
a conditional probability P' on C extending P. 

Definition 4 Given an arbitrary set of conditional events C, a real function 
P(-|-) on C is a dF-coherent conditional probability assessment if satisfies con- 
dition (CC) in Definition 2, where Xi = Ie^ , for every i. 

Coherence and dF-coherence are equivalent (see [13], [17], [20]). A further 
characterization of coherence is given by the following theorem (see, [2], [5]). 

Theorem 1 LetC he an arbitrary family of conditional events. For T = {Ei\Hi, 
...,En\Hn} ‘A C by Ao{P) we indicate the set of atoms Ar generated by the 
events E\, Eli, . . . , En, For a real function P on C the following statements 
are equivalent: 

(i) P is a coherent conditional probability on C; 
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(a) P is a dF-coherent conditional probability on C; 

(Hi) for every finite P CC there exists (at least) a class of probabilities 
. . . P ^ }, each probability P^ being defined on a suitable subset Aa{P) C Aa{P), 
such that for any Ei\Hi € T there is a unique P^ with 



P^{Ar)>0, 

Ar<XHi 



P{E^\Hf) 



E. > {Ar) 

ArQEiAHi 

E. Po.(Ar) 

ArCHi 



moreover Aa'{P) C Aa"{P) for a' > a” and P^„{Ar) =0 if Ar G Aa'{P)- 



3 A Conditional Decision Model 

Let df be a set of real numbers (e.g. monetary payoffs), S a set of states of 
nature, and A a family of subsets of S. Let T a family of acts or decisions i. e. 
random quantities ^from S taking values in X . Given f G P , A G A^ A ^ t), -we 
indicate by the conditional act (or conditional random quantity). 

For every A, B G A, A t\ B = t), vie can define the conditional act U gs 
which assumes the same values of / if A is true, the same values of g if B is true, 
and is undetermined if A V i? is false. 

Let 2? be a set of conditional acts: we denote by ^ a binary, reflexive, not 
necessarily complete, preference relation ^ in P, expressing (as in [1]) the in- 
tuitive idea of ’’preferred or indifferent to”. By ^ and ~ we denote the strict 
relation and the indifference relation, respectively. 

We introduce the conditional decision model by a simple example: 

Example 1:. Mister X is a billiards player. In the next weekend he can attend 
(only) one of three possible competitions ai , a 2 , as , which will take place in three 
different cities. Both competitions ai and a 2 are organized in illegal gambling- 
houses and, taking into account confidential information. Mister X thinks that it 
is very probable that at the end both competitions will abort. On the contrary, 
a 3 is a legal competition, but 20 is the fixed number of participants and moreover 
it is necessary to apply personally. Mister X estimates that he can arrive to apply 
on Friday at 9.30 and it is possible that at that moment it is too late. 

Let now pi < P 2 < Ps be the fees for ai , a 2 and as respectively. 

In both competitions ai and a 2 there is only one prize (for the winner), ri, r 2 
respectively, (with ri = r 2 ), while in competition as there are two prizes: r^ for 
the first and r| for the second placed (with r| = l/drg < l/2ri). 

Indicate by Ai the event ’’The attendance at competition ai is successful”, 
by Wi the event ’’one wins ai”, (i = 1,2,3) and by S 3 the event ’’one is 
placed second in as ”, it results that Mister X must choose among the following 
conditional acts: 

{ ri — Pi if Ai A Wi is true 

-p, if A, A Wf is true (i=l,2) 
und. if A) is true 
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' rl — P 3 if ^3 A W 3 is true 
^3 ~ Ps if ^3 ^3 is true 

< 

—P 3 if ^3 A W§ A S 3 is true 
^ und. if A 3 is true 



Taking into account his information about the events and the consequences of 
the acts, Mister X gives the following preference relation among the conditional 
acts: 

/I2 ^ /Ai ^ /Ai U /I3 ^ /I3 ~ /Ai U ~ fX U /I3 ~ /Ai U /I3 U /I3 

and the following relation expressing his comparative degree of belief (compar- 
ative probability) on the events: 

0 ^ Ai ~ A2 Ai V A2 A3 ~ Ai V A3 ~ A2 V ^43 ~ Ai V A2 V A3. 

Since we have A 2 < Ai\/ A 2 and Ai V A3 ~ Ai V ^2 V A3 the comparative 

probability does not satisfy the well known additivity condition (introduced by 
de Finetti [10]), that, for an incomplete relation is the following 

(P) for every A, B, C, with AaC = BAC = 9 

A ^ B ~'{B V C ^ A V (7) 

A ^ B ^ ~'{B V (7 ^ A V (7). 

Moreover, since ^ and U/A3 ~ /I2 ^ Ias’ preference relation 
does not satisfy the sure thing principle (ST), (introduced by Savage [19]), that, 
in this context, can be expressed as follows: 

(ST) for every f^, 9 B,hc such that AAC=BAC = 9we have: 

Ja A 9b ^ -'{9b a he ^ JaA he)', 

Ja ^ 9b ^ -'{9b a he a /aA he)- 

Therefore there exists neither probability representing the comparative 
probability (see [10]) nor linear utility representing the preference relation (see 
[19]). 

We introduce now a conditional decision model apt to manage situations such 
as that described in the Example. 

Given T>, we denote by E the set of events A such that at least one act in T> 
is conditioned on A. 

We call a triple {£,T>,<) a conditional structure if satisfies the following 
properties: 

(cl) It/) G T>; 

(c2) if Ja,9b G 25, then IaJbJavb G T>; 

(c3) /0 ^ I A for every Ia € T>, A ^ tl) ] 

(c4) I A A 7s, for every Ia,Ib €T>,AQB. 
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We note that a binary relation is induced on £ by the restriction of ^ to the 
subset of V containing only the indicators of events. This relation (denoted in the 
following by the same symbol can be viewed as a probability relation among 
events, expressing the subjective degree of belief (comparative probability) that 
is the idea of ”no more probable than”. It is trivial to see that it satisfies the 
following properties: 

(c3') 0 A A, for every A & £,A^%] 

(c4') A < B, for every A, B G £, A C B. 

Definition 5 Let {£, T>, a conditional structure. We say that {£, T>, P, u) is 
a rational conditional utility model for {£, T>,A) if ■ 

• u is a real function defined in T>; 

• P is a coherent conditional probability defined on the set of conditional events 
{Ei\Ei V Ej : Ei, Ej G £}\ 

• the function w defined by putting for every fAGT) and C G £,C A A, 

w{fA,C)=u{fA)P{A\C), 

is a coherent conditional prevision for the acts {flA)c = UIa)\C. 

• For every fA,9B GT> we have: 

if fA A 9 b then u{fA)P(A\A V B) < u{gB)P{B\A V B) 

if fA -< 9 b then u{fA)P(A\A M B) < u{gB)P{B\A V B) 

For a conditional structure admitting a rational conditional utility model, 
the relation ^ may be not-transitive and needs not satisfy the ’’sure thing 
principle” as proved by the following examples 

Example 2: Let A,B,C G £, fA,9B,hc G V. Let u be a utility function such 
that u{gB) = u{hc) = 0, u{fA) ^ 0 and P a coherent conditional probability 
such that P{A\A V C) yf 0, P{A\A V B) = P{B\B V C) = 0. Then fA ~ 9b and 
9b ~ he, but fA he- 

Example 3: Let A, B,C G £ such that AaC = B AC = 0, and let P such that 

P{A\A V B) = |, P{B\A V B) = I, P{A\A V B V C) = P{B\A V B V C) = 0. 

Consider now f a, 9b, he G T> and u such that u{fA) < u{9b)- Then /a ^ 9b 
but /a U /ic ~ ffs U he- 

The Example 2 shows also that the relation induced on £, need not satisfy 
condition (P). 

However weaker principles of transitivity and independence are necessary 
for a conditional structure admitting a rational utility model, as the following 
Proposition proves: 

Proposition 1 A conditional structure admitting a rational conditional utility 
model necessarily satisfies the following conditions: 

Dl) (weak transitivity). For every f a, 9b, he G V 
if A -< 9 b and gB -< he) ^ ~'{he A fA) 
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if A 9 b, 9b ~ he and he la) ^ ~'{he ^ Ia) 

(/a ~ 9b, 9b ~ he and 93 h) ^ ^{he -< /a) and -■(/a ^ he) 

D2) (weak ST). For every f a, 9b, he G T> such that AaC = BAC = ^wc 
have: 

Ia A 9b ^ ~'{9b U ft-c ^ /a U he) 

if moreover {/a -< he and 93 ~ he) or 93 U he he then 
Ja ^ 9b ^ -'{9b U /ic ^ /a U he)- 

Relation <, induced on S by ^ is a weakly additive qualitative probability, 
that is a reflexive binary relation which satisfies the following conditions: 

PI) % ^ A for every A & £,A^ %; 

P2) < has no intransitive cycles; 

PS) for every A,B,Cg£ with AaC = BAC = % 

A B =>■ —'{B V G -< A\/ C) 

moreover, if B ^ B V C or B ^ C , then 

A ^ B —'{B V G ^ Al V G). 

Axioms P1-P3 essentially introduced in [3] are the natural generalization of 
those proposed in [4], for a complete binary relation defined on an algebra of 
events. As proved in [3] they are necessary (but not sufficient) conditions for 
the existence of a coherent conditional probability P defined on £ x £'^ locally 
representing that is such that: 

Afl,B^ P{A\B V A) < P{B\B V A) 

A<B^ P{A\B V A) < P{B\B V A). 

Moreover, as proved in [7] they are sufficient for the existence of a conditional 
weakly {(B,®)- decomposable measure. 

In [16] a set of axioms to define a generalized qualitative probability is in- 
troduced. These axioms are implied by P1-P3 and then they are a necessary 
condition for the existence of a conditional probability locally representing. Nev- 
ertheless we may note that, if a generalized qualitative probability does not satisfy 
PI-P3, then it can not be locally represented either by a conditional probability 
nor by a weakly {(B,®)- decomposable measure. 

For a weakly additive qualitative probability ^ on £, we consider, for every 
event A of £, the set C{A) of events infinitely less probable than it 

£(A) = {Be£: 3G, ~ F, ^ A, F, C FJ, 
with i = 1, . . . , n and B C Vi £ Pf)- 

We note that if the relation ^ induced on £ (which is a weakly additive 
qualitative probability), is complete and £ is an algebra, then for every A € £ 
the set C{A) coincides with the set Ca (introduced in [4] and independently in 
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[16]), whose definition clearly expresses the meaning of the events infinitely less 
probable than A (for a proof see [3] or [7]): 

Ca = {B eS: Aw B A A B‘^ A} 

Now we can define the set of events with the same order of probability of A 

B{A) = {B€£-.B^ C{A) and A ^ £(B)}. 

If the relation induced on £ (weakly additive qualitative probability sat- 
isfying (c4')), is complete, and £ is an algebra, the sets £(A) and B(A) satisfy 
many structural properties, as proved in ([4]). 

We only recall here that {B{A) : A G f} is a partition of £ (independently 
of the logical structure of £). Moreover, we note that for every A,Bg£, putting 
G = max^{A, B}, we have that AW B G B{G). 

4 Rational Relations 

We introduce now a rationality condition for a conditional structure (£,'D,fi.). 
It is stated in terms of inequalities for sums of acts and then it is essentially an 
algebraic condition. 

Definition 6 Let {£,£>, be a conditional structure. We say that the relation 
< is rational if satisfies the following condition 

{R) for every n G IN and fA^dhi ^ fXi — ffhit ki > 0, (z = 1, . . . , n) 

n 

sup^ higGsi - flAi) < 0 
1 

implies either of the following conditions: 

1) fAi ^ 9hi for every i=l,...,n 

2) if SfAj ^ 9Bj for some j, then Aj V Bj G £{ V” (-4^ V Bi)) 

It is possible to give an interpretation of (i?) in terms of betting scheme. 
In fact we may regard kfig’^Isi — f^lAi) as an exchange between a bookie and 
a gambler, which yields an amount kig^ to the bookie if Bi happens, and the 
amount kif^ to the gambler if Ai happens; if Ai V Bi is false, the corresponding 
bet is annulling. This is betting even money on . versus f\^ . Suppose to have 
this rule: if /_ 4 . A 9b > z = 1, • • • , zz, the bookie should accept any combination of 
bets, with ki > 0, on g^_ versus f\_. The relation ^ is not rational if there exists 
one of these combinations, with a surely not positive gain and at least a pair of 
conditional acts f\. A g^. such that the corresponding bet has an infinitesimal 
probability, than others, of not being annulled. 

The following results focus the connections between rational relations and 
rational conditional utility models. 
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Proposition 2 Let {£,T>,<) he a eonditional structure, and ^ satisfies (R). 
Then satisfies (Dl), (D2), and the relation induced on £, is a weakly 
additive qualitative probability. 



Theorem 2 LeVD he a finite set of conditional acts and {£,T>,fif) a conditional 
structure. 

The following statements are equivalent: 

i) satisfies (R); 

ii) there exists a rational conditional utility model {£,T>,P,u). 

Proof. We give here only a sketch of the proof, making in evidence the main 
steps for the actual construction of the numerical model. 



Let ifg = V A,'iA G £, let Vq be the subset of V containing the acts condi- 
tioned on the elements of B{Eq). We denote by the relation on T> defined by 

putting: fA Ao 9b if fA A 9b and AM B G and ^ G ^ \ ^o- 

^0 denotes the set of atoms generated by the events in which the conditional acts 
assume significant values („4 q is finite because V,X). Consider now the following 
linear system S'o, where the unknown is the m- vector Wq = {Wq, . . . , is 

the cardinality of the set ^o); * denotes the operation of scalar product, 
denotes the act which assumes the same value of /* if Ai is true, and is null if 
Ai is false: 

r Wo * - pi A,) > 0 if Pa^ ^0 9 b, 

(So) <Wo* {kUp. - hUpP = 0 if ~o k^ 

[ Wo > 0 Wo PO 

By using a well known theorem of alternative (see, for instance [12]), it is 
easy to prove that So has a solution if (and only if) the following system S'q has 
no solution: 

J E - PIaP - hUpP < 0 

>0, Ea* >0> 

Interchanging and kp_, if necessary, we can assume that all fifis are not 
negative. It is easy to see that S'^ has a solution if (and only if) ^ does not 
satisfy condition (R). 

The function wo ■ P — defined by putting, for any fA G V, 



WoifA) 



* flA 

Wo * Ipo 



represents ^o> since it satisfies system S'q. 

The function Pg : £ ~ defined by putting, for any A G £ 



Po{A) 



Wq * I A 
Wo * Ipo 



is a probability distribution. 
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We note that wo{Ia) = Po{^), for every A £ £. Now we inductively define 
£k = = \J AyA G £k^ for fc > 1. Let T>k be the subset of V 

containing the acts conditioned on the elements of Sk and the subset of T>k 
containing the acts conditioned on the elements of B{E^). 

We denote by :<k the relation on T>k defined by putting: gs if Ja A 9b 

and AU B G B{E'^), and /a It/t, for every /a & 'DkX'B'k- Following the same 
procedure of we obtain a function Wk ■ T)k — defined by putting, for any 
/a G Bk, 



WkifA) 



Wk * flA 
Wk * Ie° 



which represents :<k- We also define the function Pk : £k —>R by putting, for 
every A e £k, 



Pk{A) 



Wk * I A 
Wk * Ie° 



which is a probability distribution. 

After a finite number h of steps, we get that C(E^) contains only the im- 
possible event 0. Let So,. .. ,Sh have a solution: if ^ does not satisfy (R), it is 
easy to see that there exists a subsystem of one among the systems Sq, - ■ ■ ,Sh 
without solutions. 

So ^ satisfies the condition of rationality if (and only if) So, ■ ■ ■ , Sh have a 
solution. 

The existence of solution for So, ■■■ ,Sh implies the existence of a rational 
conditional utility model for {£,T>, A)' for every Ja G T>, we define 



u(/a) 



WkifA) 

Wk{lA) 



where k is the natural number such that A G B{E^). 
Then, for every A\A V B with A,Bg£, we define 



P{A\A\/ B) 



PsjA) 

Ps(AvB) 



where s is such that Ay B & B{E^). It is easy to see that u and P are well 
defined, and that, for every A £ £, one has w(/yi) = 1. Moreover, by using 
Theorem 1 we can show that P is a coherent conditional probability. Following 
the same line it is possible to prove that function w defined by putting, for any 
Ja & P ,C € £, with C A A 



w{fA,C) 



uUa) 



PciA) 

PsiC) 



where a is such that C G B{E^), is a coherent conditional prevision. It is imme- 
diate to prove that w locally represents 

Vice versa, if there exists a rational conditional utility model for P, then any 
system Sa have a solution 



Wa = (w(lcj, ■ ■ .,w(Ic,^)) (a = 0,...,n) 
where Ci, ... , Cm are the atoms of Aa • 
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We notice that the steps of construction of the model, given in the proof, 
single out in a natural way the steps of an algorithm to check rationality of a 
preference relation on a finite set of conditional acts. 

5 Strongly Rational Relations 

We notice that for infinite set of conditional acts, condition (i?) is not sufficient 
to the existence of a rational conditional utility model {S, T>, P, u) for (£, V, ^). 
We introduce now a condition of strong rationality (SR). 

Definition 7 Let be a conditional structure. We say that the relation 

:< is strongly rational if it satisfies the following condition 

{SR) for every f\. < gfi. there exists a 6i > 0,6i > 0 for f\. -< gfi., such that 
for every n € IN, f\.,gB. ki > 0, one has 

n 

sup V. - pi A, - SilA.VBj < 0 

1 

implies either of the following conditions: 

1) fki ~ 9Bi Pf every i=l,...,n; 

2) if pAj ^ pBj Pr some j, then Aj V Bj G £( \/pA^ V Bi)) 

It is possible to give an interpretation of {SR) (in terms of betting scheme) 
similar to that for condition {R), by regarding Si as a penalty that one must pay 
to bet on a preferred conditional act. 

It is immediate to prove the following result: 

Proposition 3 Let {£,T>,<) he a conditional structure. Lf < satisfies {SR) then 
satisfies {R). 

Strong rationality characterizes relations on an arbitrary set of conditional 
acts, which admit a rational conditional utility model: 

Theorem 3 Let T> he an arbitrary set of conditional acts and {£, T>, a con- 
ditional structure. The following statements are equivalent: 

i) A satisfies {SR); 

ii) there exists a rational conditional utility model {£,!), P,u). 

The theorem can be proved by using a compactification theorem and the 
following 

Lemma 1 Let T> he an arbitrary set of conditional acts and {£,£>,<) a 
conditional structure. The following statements are equivalent: 

i) satisfies {SR); 

ii) for every finite set T Q P, there exists a rational conditional utility model 
{£b,J^,Pf,ub), for {£,T,Ar) such that, for every fA,gB G iF, with fA -< gB 
we have 



uj^{gB)Pj^{B\A \J B) — u^{f a)Pj^{A\A V B) > St. 
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Abstract. Our starting point is the approach to probabilistic logic 
through coherence', but we give up de Finetti’s idea of a conditional 
event E\H being a 3-valued entity, with the third value being just an 
undetermined common value for all ordered pairs {E, H). We let instead 
the “third” value ot E\H suitably depend on the given pair. In this way 
we get, through a direct assignment of conditional probability, a general 
theory of probabilistic reasoning able to encompass other approaches to 
uncertain reasoning, such as fuzziness and default reasoning. We are also 
able to put forward a meaningful concept of conditional independence, 
which avoids many of the usual inconsistencies related to logical depen- 
dence. We give an example in which we put together different kinds of 
information and show how coherent conditional probability can act as a 
unifying tool. 



1 Introduction 

Our general aim is that of synthesizing our research in the field of probabilistic 
reasoning that has been published (going back to papers such as [15] and [2]) 
in diverse sources or that has been developed from diverse points of view. In 
fact this program will be fully pursued in a forthcoming paper [9]: so, with the 
limited space which is available here, we just try to convey some aspects of the 
main ideas and methodologies ruling our approach. A cornerstone is the con- 
cept of coherence, which allows to define conditional probability on an arbitrary 
family of conditional events. The starting point is a synthesis of the available 
information expressed by one or more events: to this purpose, the concept of 
event must be given its more general meaning, i.e. it must not looked on just as 
a possible outcome (a subset of the so-called “sample space”), but expressed by 
a proposition. Moreover, events play a two-fold role, since we must consider not 
only those events which are the direct object of study, but also those which rep- 
resent the relevant “state of information” : in fact conditional events are the tools 
that allow to manage specific (conditional) statements and to update (through 
conditional probability) degrees of belief on the basis of the evidence. 
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What is usually emphasized in the literature - when a conditional probability 
P{E\H) is taken into account - is only the fact that P{-\H) is a probability 
for any given H: this is a very restrictive (and misleading) view of conditional 
probability, corresponding trivially to just a modification of the “world” fi. 

It is instead essential to regard the conditioning event H as & “variable”, i.e. 
the “status” of H in E\H is not just that of something representing a given fact, 
but that of an (uncertain) event (like E) for which the knowledge of its truth 
value is not required (this means, using a terminology due to Koopman [19], 
that P[ must be looked on as being contemplated, even if asserted', similar terms 
are, respectively, assumed versus acquired). So, even if beliefs may come from 
various sources, they can be treated in the same way and can be measured by 
probability, since the relevant events (including statistical data!) can always be 
considered as being assumed propositions. In particular, the “statistical” concept 
of likelihood is nothing else that a conditional probability seen as a function of 
the conditioning event. 

The concept of conditional event (as dealt with in this paper) plays a central 
role for the probabilistic reasoning. We generalize (or better, in a sense, we give 
up) the idea of de Finetti of looking at a conditional event E\H, with El ^ % (the 
impossible event), as a 3-valued logical entity (true when both E and P[ are true, 
false when El is true and E is false, “undetermined” when El is false) by letting 
the third value suitably depend on the given ordered pair (E,H) and not being 
just an undetermined common value for all pairs: it turns out (as explained in 
detail in [5]) that this function can be seen as a measure of the degree of belief 
in the conditional event E\H, which under “natural” conditions reduces to the 
conditional probability P(E\H), in its most general sense related to the concept 
of coherence, and satisfying the classic axioms as given by de Finetti [11], Renyi 
[28], Krauss [20], Dubins [12] (see Section 2). So, taking de Finetti’s approach 
as starting point is not just a “semantic” attitude in favour of the subjectivist 
position. Rather it is mainly a way of exploiting the “syntactic” advantages of 
this view by resorting to an operational procedure (based on linear systems, see 
Section 4) which allows to consider, for example, partial probability assessments 
(numerical or comparative) on an arbitrary set of conditional events and to make 
inference with respect to any “new” event or information. 

A relevant theory (but only for unconditional events) is the probabilistic logic 
by N.J. Nilsson [22], which is just a re-phrasing (with different terminology) of 
de Finetti’s theory, as Nilsson himself acknowledges in [23]. 

We list some peculiarities (which entail a large flexibility in the management 
of any kind of uncertainty) of this concept of coherent conditional probability 
versus the usual one: 

— due to its direct assignment as a whole, the knowledge (or the assessment) of 
the “joint” and “marginal” unconditional probabilities P(EAH) and P(H) 
is not required; 

— the conditioning event P[ (which must be a possible one) may have zero 
probability, so that (as shown for example in [3]) the class of admissible 
conditional probability assessments and that of possible extensions are larger 
(and the ensuing algorithms are more flexible). Moreover, in the assignment 
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of P{E\H) we are driven by coherence, contrary to what is done in those 
treatments (for example, that by Frisch and Haddawy in [13]: see Section 3) 
where the relevant conditional probability is given an arbitrary value in the 
case of a conditioning event of zero probability; 

— it allows a management of stochastic independence (conditional or not) which 
avoids many of the usual inconsistencies related to logical dependence. The 
latter situation may arise in the usual probabilistic approach which resort 
to graphical models (as, e.g., [10], [35], [21]); 

— a suitable interpretation of its extreme values 0 and 1 for situations which 
are different, respectively, from the trivial ones E AH = % and H C E, leads 
to a “natural” treatment of the default reasoning; 

— it is possible to represent and manage “vague” statements as those of fuzzy 
theory (but there is no room here to discuss these aspects: see [6], [8]). 

2 Conditional Events and Conditional Probability 

We shall refer to a definition of conditional event extensively discussed, e.g., in 
[5]. We just recall that a conditional event E\H can be represented as a random 
variable (denoting by I a the indicator of the event A) 

(1) X = l- Ieah + 0 • Ie^ah + t{E\H) ■ Ih^ , 

so that we have X = 1 when both E and H are true, X = 0 when H is true 
and E is false, and X = t{E\H) when H is false (in particular, we have only two 
values, 1 and 0, when H = Q, the certain event). 

By requiring the closure of suitable operations introduced inside this par- 
ticular class of random variables (which is an arbitrary family C of conditional 
events), we get “automatically” (so to say) conditions on t{E\H). Choosing as 
operations in C the ordinary sum and product, t(E\H) satisfies the classic axioms 
for a conditional probability, which read (if the set C = Q x B° of conditional 
events E\H is such that Q is & Boolean algebra and B C Q is closed with respect 
to (finite) logical sums (i.e., disjunctions) with B° = B \ {0} ) as follows: 

(1) P{H\H) = 1, for every H £ B° , 

(a) P{-\H) is a (finitely additive) probability on Q for any given H £ B° , 

(Hi) P{{EAA)\H) = P{E\H) ■ P{A\{E A H)) , for any A, E £ g , H,EAH £B°. 

Putting P{-\H) = Ph{-), property (in) can be written 

(2) Ph{EAA)=Ph{E)-Peah{A). 

This means that a conditional probability Ph{-) is not singled-out by its con- 
ditioning event H , since its values are bound to suitable values of another con- 
ditional probability, i.e. Peah{-) ■ Then Ph{-) cannot be assigned (so to say) 
“autonomously” (recall the comments already made in Section 1 about the mis- 
leading way of looking at conditional probability) . 

On the other hand, if Pn{-) = P{-) is strictly positive on B°, we can write, 
putting H = n in (2), 
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P{EAA) = P{E) ■P{A\E). 

Then - in this case - all conditional probabilities P{-\E), for any E, are uniquely 
determined by a single “unconditional” P (Kolmogorov approach) , while in gen- 
eral - see Theorem 3 in Section 4 - we need a class of probabilities Pa’s to 
represent the “whole” conditional probability. 

3 Popper Measure 

In the relevant literature there is no agreement in the definition of conditional 
measures when the conditioning event has zero probability. Also Popper in [25] 
(pp. 332-335) emphasizes the need for this more general view of conditional 
probability. He gives the following definition (we refer to the version presented 
in [17], p.84): 

Definition 1 - A Popper measure P onQ xQ (where Q is & Boolean algebra) 
is a function satisfying the following axioms: 

1. 0 < P{B\A) < P{A\A) = 1 for al\A,BGg, 

2. if P{A\B) = 1 = P{B\A), then P{C\A) = P{C\B) for &\\ C & Q , 

3. if P{C\A) 1, then P(B'^\A) = 1 - P(B\A) for all B e g , 

4. P{A A B\C) = P{A\C)P{B\A A C) for all A,B,Cing, 

5. P{AAB\C) < P{B\C), for all A, B,C in g, 

6. there are two events A and B in g such that P{A\B) < P{A\A) . 

Since a conditional probability P is defined on g x B°, to compare it with a 
Popper measure we need to extend P (with B = g) to gxg, i.e. to allow also 0 as 
conditioning event: notice that the only extension P* (still dubbed “conditional 
probability”) compatible with Popper measure (cf. axiom 1) is P*(A]0) = 1 for 
any A € f/. In [9] we prove the following 

Theorem 1 - A conditional probability P* on ^ x ^ is a Popper measure. 
The following example shows that the converse is not true: in fact, when 
P{H) = 0, a Popper measure P{-\H) may not be a probability. 

Example 1 - Let S = {Ai, A 2 , A 3 } be a partition of 12 and denote by g the 
algebra spanned by £. Putting P{Ai y A 2 \fi) = 0, P{A 3 \fi) = 1, define on g the 
measures P{-\Ai V A 2 ), P(-jAi), P{-\A 2 ), P(-|0) constantly equal to 1. Hence, 
the other values are uniquely determined by Popper axioms. This is a Popper 
measure, but P{%\A\ V A 2 ) = 1 and P{A\ V A 2 IA 1 V A 2 ) ^ P{Ai\A\ V A 2 ) + 
P(A2]Ai V A 2 ). 

Notice that this “unpleasant” consequence occurs, even more so, for all defini- 
tions that assign arbitrary values to P{-\H) , as done, for example, by Frisch and 
Haddawy in [13]. The way-out is through the following modification of axiom 3, 
that should read: P{B^\A) = 1 — P{B\A) for all B £ g and A ^ th . 

4 Coherent Conditional Probability and Extensions 

In Section 2, conditional probability P has been defined ong xB°] however it is 
possible, through the concept of coherence, to handle also those situations where 
we need to assess P on an arbitrary set C of conditional events. 
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Definition 2 - The assessment P{-\-) on C is coherent if there exists C D C, 
with C' = Q X B°, such that P{-\-) can be extended from C to C as a conditional 
probability. 

A characterization of coherence is given (see, e.g., [3]) by the following 

Theorem 3 - Let C be an arbitrary finite family of conditional events 
Ei\Hi, . . . ,En\Hn and Ao denote the set of atoms generated by the (un- 
conditional) events E\, Pti, . . . , En, Pin- For a real function P on C the following 
two statements are equivalent: 

(i) P is a coherent conditional probability on C; 

(ii) there exists (at least) a class of probabilities {Pq, Pi, . . . P*}, each proba- 
bility Pa being defined on a suitable subset Aa C Ag, such that for any Ei\Hi e C 
there is a unique Pa with 



(3) 



V Pa(Ar)>0, P(EilHi) 

ArCHi 



Pa{Ar) 

ArCEiAHi 

Sr Pa{Ar) 

ArCHi 



moreover Aa' C Aa” for a' > a” and Pa”{Ar) = 0 if G Aa' ■ 

According to Theorem 3, a coherent conditional probability gives rise to a 
suitable class {Po,Pi, . . . P^} of “unconditional” probabilities. 

Where do the above classes of probabilities come from? Since P is coherent, 
there exists an extension Pg on Q x Q° , where Q is the algebra generated by the 
set Ag of atoms: then, putting B = {fi, 0}, its restriction to Ag x B° satisfies (3) 
with q: = 0 for any Ei\Hi such that Pg{Hi) > 0 . The subset A\ C Ag contains 
only the atoms A^ C the union of Hi’s with Pg{Hi) = 0 (and so on: see the 
system below). 

Conversely, given an assessment P on the family C, let Pg be a probability 
on the algebra Q agreeing with the conditional assessment (in the sense that, 
for every conditional event of C, {ii) is satisfied for Hi = fi). Let Qi C S he the 
subalgebra of events E £ Q such that Pg{E) = 0 ; we can define a new probability 
Pi on Q \ , agreeing with the conditional assessment whose conditioning event is 
in Qi, and consider Q 2 C Q\ such that Pi{E) = 0 , and so on. It is clear that 
in this process, given jl > a, the assignment of the probability Pp is in no way 
bound by the probability Pa (except for the domain), but the only constraints 
are given by the conditional assessments. So, starting from the class V = {Pa}, 
the function P(-|-) on ^ x defined by putting, for any E\H £ Q x Q°, 



P{E\H) 



PqjEH) 

PaiH) 



where a is the index such that Pa{H) > 0, is a conditional probability on Q x Q° 
(as proved, for example, in [5]). 

The proof of the equivalence between conditions (i) and (ii) gives rise to an 
algorithm to test the coherence of the assessment P, based on the equivalence 
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between condition (ii) and the compatibility of a sequence of systems (Sa) with 
unknowns Pa{Ar) > 0, G Aa, 



{So) 



' P^{Ar) = P{Ei\Hi) P^{Ar) 

ArCEiAHi A^CHi 

< 

Y Pa{Ar) = l 

K 



[if Pa-i{Hi) = 0] , 



where P-i{Pli) = 0 for all iTj’s, and Pig denotes, for ct > 0, the union of the 
Hi’s such that Pa-i{Hi) = 0; so, in particular, H° = Hg = H\U . . .U iT„ . 

Any class {Pa} singled-out by the condition (ii) is said to agree with the 
conditional probability P. Notice that in general there are infinite classes of 
probabilities {-Pa} ; in particular we have only one elass in the case that C is a 
product of Boolean algebras. 

Another fundamental result is the following, essentially due (for uncondi- 
tional events, and referring to an equivalent form of coherence in terms of betting 
scheme) to de Finetti [11] (and extended to conditional events in [18], [26]). 

Theorem 2 - Let C be a family of conditional events and P a corresponding 
assessment; then there exists a (possibly not unique) coherent extension of P to 
an arbitrary family IC D C, if and only if P is coherent on C. 

In particular, given a “new” conditional event E\H, i.e. if /C = C U {E\H}, 
then coherent assessments oi p = P{E\H) are all values of a suitable closed 
interval [p' ,p"\ C [0,1], with p' < p" . For a finite C, in [3], [5] the two bounds 
p' and p” have been eharaeterised as infimum and supremum, respectively, of 
probabilities P{E^\H^) and P{E*\H*) of suitable eonditional events. The rele- 
vant algorithm, based on linear programming technique, has been implemented 
in [1]. 

Notice that the problem of “updating” the priors P{Hi) into the posteriors 
P{Hi\E) (through Bayes’ theorem, for a set of exhaustive and mutually exelusive 
hypotheses Hi) is just a very particular case of a coherent extension (see [7]): 
this Bayesian extension is unique and can be computed only if P{E) > 0. 



5 Zero-layers and Spohn’s ranking function 

Given a class V = {Pa}, agreeing with a conditional probability, it is possible 
to define the zero-layer o(H) of an event H as 

o(H)=a ifPa(H)>0, 

and the zero-layer of a conditional event E\H as 

o{E\H) = o{EAH) -o(iT). 

Obviously, for the certain event 12 and for any event E with positive probability, 
we have °(12) = o{E) = 0 (so that, if the class contains only an everywhere 
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positive probability Pq, there is only one (trivial) zero-layer, i.e. 0 = 0), while 
the zero-layer o(0) is greater than that of any possible event (so that we put 
o(0) = -boo). 

For an example, see Section 6.2, where we will show the crucial role played 
by zero-layers for the concept of conditional independence. 

On the other hand, Spohn (see, for example, [31], [32]) considers degrees of 
plausibility defined via a ranking function, that is a map k that assigns to each 
possible proposition a natural number (its rank) such that 

(a) either k{A) = 0 or k{A*^) = 0 (or both) ; 

(b) /t(A V B) = min{/t(A), /t(B)} ; 

(c) for all A A B 0 the conditional rank of B given A is 
k{B\A) = k{AaB) -k{A) . 

Ranks represent degrees of “disbelief”. For example, A is not disbelieved 
iff k,{A) = 0, and it is disbelieved iff k,{A) > 0. Ranking functions are seen by 
Spohn as a tool to manage plain belief and belief revision, since he maintains that 
probability is inadequate for this purpose. In our framework this claim can be 
challenged, since our tools for belief revision are eoherent eonditional probabilities 
and the ensuing concept of zero-layers: it is easy to check that zero-layers have 
the same formal properties of ranking functions. 

A brief discussion of plain belief is in Section 6.1. 

6 Coherent Conditional Probability as Unifying Tool 

In this Section we discuss briefly how to handle, by means of eoherent conditional 
probability, some aspects of default reasoning (see, e.g., [27], [29]) and eonditional 
independenee. 

In Section 6.3 we consider an example in which we put together different 
kinds of information and show how coherent conditional probability can act as 
a unifying tool. 

6.1 On the default reasoning 

First of all, we show that a sensible use of events whose probability is 0 (or 1) 
can be a more general tool in revising beliefs when new information comes to 
the fore. 

We can challenge (see also [14]) a claim contained in [30] that probability is 
inadequate for revising plain belief: A believe A is true eannot be represented by 
P(A) = 1 beeause a probability equal to 1 is ineorrigible, that is, P{A\B) = 1 for 
all B sueh that P(A\B) is well defined. However, plain belief is elearly eorrigible. 
I may believe it is snowing outside but when I look out the window and observe 
that it has stopped snowing, I now believe that it is not snowing outside”. In the 
usual framework, the above reasoning is correct, since P{B) > 0 implies that 
there are no logieal relations between B and A (i.e., in particular, A A B ^ 
and P(A\B) = 1. Taking instead P(B) = 0, we may have A A B = % and 
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SO P{A\B) = 0. On the other hand, taking B= “looking out the window, one 
observes that it is not snowing” (again assuming P{B) = 0), and putting A=“it 
is snowing outside” , we can put P{A) = 1 to express a strong belief in A. Then 
it is clearly possible to assess coherently P{A\B) = p for every p € [0, 1]. So, 
contrary to the claim in [30] , a probability equal to 1 can be updated. 

Now we discuss briefly (exploiting the possibility of updating, in our setting, 
a probability equal to 1) the default reasoning, by referring to the classic ex- 
ample of Tweety. As recalled in Section 1, we may deal with the extreme value 
P{E\H) = 1 also for situations which are different from the trivial one H C E: 
the latter can be anyway useful to express that a penguin (V) is certainly a 
bird {B), i.e. V C B, by putting P{B\V) = 1; moreover we know that usually 
Tweety (T) is a penguin, and this fact can be represented by P{V\T) = 1. 

But we can express as well the statement “a penguin usually does not fly” (we 
denote by the contrary of P, the latter symbol denoting “flying”) by writing 
P{P^\V) = 1. Then the question “can Tweety fly?” can be faced through an 
assessment of the conditional probability P{P\T), which must be coherent with 
the already assessed ones: by Theorem 3, it can be shown that any value p € [0, 1] 
is a coherent value for P{P\T), so that no conclusion can be reached - from the 
given premises - on Tweety’s ability of flying. In other words, interpreting an 
equality such as P{E\H) = 1 like a sort of weak implication (denoted by i — )•), 
which in particular (when H C E) reduces to the usual one, we have shown its 
nontransitivity: in fact 

T^V and V^P^, 

but it does not necessarily follow that T : — > P^ (even if we might have that 
P{P^\T) = 1, i.e. that “Tweety usually does not fly”). 

Notice the simplicity of this approach, which encompasses other well-known 
methodologies, such as that by Goldszmidt and Pearl [16]. 

6.2 Conditional Independence 

In this Section, to avoid cumbersome notation, we will denote any conjunction 
E AH simply by EH. 

It is well known that the classical definition of stochastic independence of 
two events A, B, i.e. 

P{AB) = P{A)P{B ) , 

gives rise to counterintuitive situations, in particular when the given events have 
probability 0 or 1. For example, an event A with P{A) = 0 or 1 is stochasti- 
cally independent of itself, while it is natural (due to the intuitive meaning of 
independence) to require for any event E to be dependent on itself. 

We recall some results from [4], extended to conditional independence in [33]. 

Definition 3 - Given a set S of events containing A,B,C,A^,B^, with 
BC ^ fi, BC ^ 0, and a coherent conditional probability P, defined on a 
family C C £ x £° and containing V = {A\BC, A\B^C, A'^\BC, A'^\B^C'\, we say 
that A is conditionally independent of B given C with respect to P (in symbols 
AdLcsB\C) if both the following conditions hold: 
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(i) P{A\BC) = P{A\B^C ) ; 

(ii) there exists a class {Pa} of probabilities agreeing with the restriction of 
P to the family V, such that 

o{A\BC) = o{A\B^C) and o (A^\BC) = o{A^\B^C) . 

If condition (i) holds with P{A\BC) = 0, then the second equality under (ii) 
is trivially satisfied, so that conditional independence is ruled by the first one (in 
other words, equality (i) is not enough to assure independenee when both sides 
are null: it needs to be “reinforeed” by the requirement that also their zero-layers 
must be equal). Analogously, if condition (i) holds with P{A\BC) = 1 (so that 
P{A^\BC) = 0), independence is ruled by the second equality under (ii). 

The symmetry property 

AJL,,B\C B±,,A\C 

does not hold in general (see [33], [34]). For a “semantic” interpretation (in the 
case C = 12), we recall the following example given in [4]. 

Example 2 - Consider two events A, B, with 

P{A) = 0 , P{B) = 1 ; 

for instance, B = A = {97,44,402}, which are among the possible 

answers that can be obtained by asking a mathematician to choose at his will 
and tell us a natural number n (a possible choice could be n = [e"^^^]! + 1, where 
[x] denotes the maximum integer < x). Clearly, the probability distribution of 
the possible answers is a finitely additive one, with P{n) = 0 for any n. Assessing 

P{B\A) = ^,P{B\A^) = ^, 

it turns out that BJLcsA does not hold, while AJLcsB : in fact, assessing 

P{B\A) = 0 = P{B\A ^) , 

we need to find the relevant zero-layers, and system (Sg) gives easily Po{AB) = 
0 < Po{B) and Po{AB^) = 0 < Po{B^), while system (5i) gives P\{AB) > 0 
and P\ (AB^) > 0 , so that 

o(A]B) = o{AB) - o(B) = 1 - 0 = o(AB'^) - o(B'^) = 1 = o(A|B'') . 

This lack of symmetry means (roughly speaking) that the occurrence of the 
event B of positive probability does not “influence” the probability of A; but this 
circumstance does not entail, conversely, that the occurrence of the “unexpected” 
(zero probability) event A does not “influence” the (positive) probability of B. 

Theorem 4 - Given the possible events AC, BC , with AC C C, if AJLcsB JG, 
then A and B are logieally independent (i.e., none of the four events AB, A'^B, 
AB^, A^B^ is impossible). 
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In [4], [34] there are also theorems characterizing stochastic independence of 
two logically independent events A and B in terms of the probabilities P(B|C'), 
P(B\AC) and P(B\A^C), giving up any direct reference to the zero-layers. 

The usual definition [10] of conditional independence does not imply logical 
independence. For example Witthaker in [35] considers a probability distribution 
on the events A,B,C such that P{ABC) + P(A^B^C^) = 1: hence the event 
A turns out to be conditionally independent of B given C, even if A = B = C 
(so that A and B are not logically independent). This is counterintuitive: it 
is possible to show [33] that, with our definition, the conditional independence 
does not hold. Going back to Example 2 and checking all possible conditional 
independence statements, we always get different results from the usual theory, 
for which statements such as AJLB\E and BJLE\A cannot even be considered; 
moreover EJLA\B holds in spite of the logical dependence between E and A, 
while EJLcsA\B does not hold (as can be easily checked taking the coherent 
assessment P(E\AB) = 1 ^ P{E\A^B) =0). 

This more general concept of conditional independence has been studied 
(also for random variables) in [34]. The graphoid properties (see [24]) have been 
checked: in our framework, the set of conditional independence statements is 
closed with respect to the properties of decomposition and its reverse, weak union, 
contraction and its reverse, intersection (for which there is no need to require 
strict positivity of the relevant probability) and its reverse, while the symmetric 
property does not hold in general. Therefore our independence model is more 
general than a graphoid structure (which in the usual independence models is 
induced by a strictly positive probability). 

The representation problem by means of graphs (also for models not closed 
with respect to the symmetric property) has been faced in [33]. 



6.3 Putting together different kinds of partial knowledge 

Example 3 - This example extends Example 3 given in [7]. A patient ar- 
rives at the hospital showing symptoms of choking: the doctor considers the fol- 
lowing hypotheses concerning the patient situation: iTi = “cardiac insufficiency”, 
772 = “asthma attack”, H 3 = H 2 A H , where 77= “cardiac lesion”. The doctor 
does not regard them as mutually exclusive; moreover, he assumes the follow- 
ing natural logical relation: H 3 C 77i A 772 . The doctor makes the probability 
assessments 

7^(772) = \ , 7 ^( 773 ) = I , P(7fi V 772) = I , 

and he expresses the comparative judgement H 2 -< 77i, i.e. the asthma attack 
is “less probable” than cardiac insufficiency. It means that P{Hi) = p, where p 
must be a value strictly greater than P{Pl 2 ) = Its coherence is easily checked 
by referring to the system of Section 4, which has in this case the solution 

P(77i 772 771) = P - ^ , 7^(77f 772771) = ^~P, 

P(77i772773) = ^ , P{H,HIHI) = 1 = 2P(77f77|77|) 
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under the constraint ^ < p < 

Put now E = “taking the medicine M against asthma does not reduce choking 
symptoms”. Since the fact E is incompatible with having asthma attack {H 2 ), 
unless the patient has cardiac insufficiency or lesion (recall that H 3 implies both 
Ell and H 2 ), then for the doctor H 2 A A E = ^ . Moreover, he believes that 
usually the medicine M does not reduce symptoms if the patient suffers from 
cardiac insufficiency, so - by the theory sketched in Section 7.1 - it follows 
P{E\Hi) = 1. A further information comes from his database, the “partial 
likelihood” P{E\H§) = |. The whole “knowledge” is coherent, as can be proved 
by solving the relevant system, and we get 

P{E^HiH2 H3 ) = P{E^HiH2 HI) = 0 , P{E^H^H2HI) = ^ , 
P{E^HiH^Hi) = 0 , P(E^H^iH^HI) = ^ , P{EHiH2H3) = ^ , 

P(EHiH 2 HI) = 0 , P(EHiHIHi) = ^ , P{EH^H^HI) = 0 

and p = ^ . Then the updating process allows to compute P{Hi\E), i = 1,2,3, 
for example P{H 2 \E) = |. For the sake of brevity, we omit other details. 
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Abstract. In this paper we propose a characterization theorem for co- 
herent conditional previsions assessed on finite conditional random vari- 
ables. The main feature is the direct applicability of the results to model 
practical problems. In fact the check of coherence and the inferential 
steps are reduced to the solvability of linear systems and of linear pro- 
gramming problems, respectively. The guideline of the procedure has 
been the already stated results for coherent conditional probabilities, so 
that now we have a unified theory for uncertainty represented by belief or 
synthesized by prevision. The procedure turns out to be helpful when the 
size of the relevant quantities strongly depend on the different scenarios 
in which they are considered. A simple example shows the potentiality 
of the entire machinery on a decision-aid problem. 



1 Introduction 

From applied areas in the treatment of uncertainty, the request of procedures 
that well ’’fuse” together large applicability, soundness in results and dynamic 
behaviour, are strongly demanded. For these reasons (but not only these) the 
approaches that go back to the more general interpretation of probability are 
having a wider and wider diffusion (as a representative, but not exhaustive, list 
see for example [3,4,5,6,7,11]). In fact, in the last decades several approaches 
have been developed adopting different uncertainty measure to escape from the 
“heavy and frozen” primordial probabilistic models. Anyhow, with a careful dis- 
tinction between what are the logical and what the numerical components of 
the problem at hand, it is possible to stay within the “safe” probabilistic ma- 
chinery, but requiring few information, just those really needed and available. 
This is possible by working with what are called ’’partial assessments”, that 
means with numerical evaluations not given on fully specified framework (repre- 
sentable by mathematical structures with particularly nice properties), but on 
’’macro situations” represented by arbitrary sets of quantities. 

Obviously, mathematical structures will be adopted, but just as operational 
tools and will be generated by the information and not requested ” a priori” . 

In these approaches, conditional evaluations, that well represent real phe- 
nomena whose entity strongly depends on the different scenarios in which they 
are studied, can be given directly and not as derived tools. 
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When the uncertainty is numerically expressed by an assessment of previsions 
P on a set of (conditional) random variables Yi\Hi, the principle that guides all 
these methodologies is that of coherence, introduced by de Finetti in [9] and 
later refreshed ([10,12]), developed for applied problems ([11]) and generalized 
([7]). This principle is mainly based on a betting scheme, so that it could appear 
too abstract for practical applications. Anyhow, thanks to several alternative 
theorems, it has been shown it is possible to reduce it to linear programming 
problems, finding a good tool to deal with representation and analysis of real 
situations (once again, as reference points see for example [3,5,7,11]) 

On the subject of conditional probability assessments (i.e. when the Yi’s are 
{0, 1} random variables) and on that of unconditional previsions (i.e. when the 
Hi are the sure event), there are nowadays sound and well established results di- 
rectly applicable. In particular, conditional probabilities have been characterized 
through sequences of unconditional probability distributions attainable solving 
sequences of linear systems. Once such distributions are obtained, any kind of 
inference step can be easily guided. 

On the other hand, the results for conditional prevision assessments were 
almost theorical, so with this paper we ’’build a bridge” with the operational 
aspects. 

In particular, in Sect. 2 we will give the primary notions and the already 
known results about coherent conditional probabilities and previsions. In Sect. 3 
we propose a characterization theorem for conditional coherent previsions and 
inferential steps which use linear programming techniques, in analogy with what 
has been done for coherent conditional probabilities. Hence everything can be 
easily implemented for automatic decision-aid systems working on a finite frame- 
work, which is the natural choice for practical applications. And finally, to better 
understand all the ’’apparatus”, we also describe in Sect. 4 a simplified example 
of application of our results. 

2 Preliminaries 

First of all we need to introduce a suitable notation to adapt to our purpose 
the different (but strictly connected) approaches present in the literature (see 
for example [7,10]). 

Given an arbitrary finite set V of finite random variables (r.v.) we denote by 
< V > the algebra spanned by the elementary events {Vi = u}, with Vi gV 
Denoting by 17 the set of all the atoms of < V >, r = 1, . . . , #17 any r.v. 
Vi GV can be identified with a vector of (which we continue to designate 

with the same symbol). Its components Vir’s represent the values taken by Vi 
on the different A^’s. In particular, if the r.v. Vi is an event E, its associated 
vector reduces to the indicator of the atoms Q E (the sure event 17 coincide 
with the set of all the atoms and will have all components equal to 1, while 

^ Throughout all the paper we make the assumption to work in a finite framework so 
that, when not explicitly mentioned, it must be understood. 

^ With #(A) we denote the cardinality of any set A. 




134 A. Capotorti and T. Paneni 



the impossible event (j) coincide with the empty set of atoms and will have all 
components equal to 0). 

In this way we can represent the logical connectives by arithmetic operations 
to apply component-wise to vectors, as shown in the following table 



set repr. 


vector repr. 


ECE 


E <F 


EnE 


EE (product) 


EDE 


EVE (max) 



These component-wise operations can be extended also to linear combina- 
tions of random variables, so that to ^ ■ XiVi is associated the vector with r-th 
component XiVir- 

With such notation we can also easily represent the linear decomposition 
of the probability of an event P{E) on its components P{Ar). In fact, if X 
represents a vector whose components are Xr = P{Ar), with P distribution of 
probability, then, for every event E, P{E) = if • X = Xr, where • stands 

for the scalar product. 

We will mainly deal with a set of conditional random variables 
V = A conditional r.v. Yi\Pli is usually understood as the 

restriction of the r.v. Yi to the atoms in Eli, 

To the set T> we do not require any particular property (like to be a ring or 
to be closed under some operation) but we need to know the logical relations 
among both the events of the form {Y^ = yi} and the Hi’s to be able to know 
(at least ’’potentially”) the algebra A =< {YiHi : Yi\Hi G T>} >. 

Very often we will refer to atoms A^ € A with the further property to be 
inside some conditioning event Hi, so that we will denote by Hq the union of all 
the conditioning {Hq = Hi) and by A\^j^ the subalgebra generated by such 
atoms. 

The set T> will be used to formalize the structure of quantities involved in a 
practical decision process together with their specific scenarios (the hypotheses 
Hi’s). On the contrary, uncertainty about the possible values for the Yi\Hi’s will 
be summarized by a n-vector P of real numbers, whose components f‘{Yi\Hi) 
represent the previsions on the single Yi\Hi’s expressed by a subject (an ’’expert” 
or more generally a decision maker). To obtain sound decisions, such values 
must be consistent and for this de Finetti [8] has introduced the Coherence 
Principle that later others ([7,10,11,12]) have refreshed, extended and found 
relevant application. 

A full and detailed presentation of all the known properties and results is 
here impossible, so we limit to report the definitions and theorems introduced in 
the cited papers that we will use in the sequel. The notation is adapted to our 
framework, even if the vector notation is not always adopted because, even if it 
is more concise than others, sometimes it makes hard the understanding of the 
properties. 
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Definition 1. (The Principle of Conditional Coherence) Let F be a map 
from T> to M. Then ¥ is a coherent conditional prevision on T> iff for all 
Yi\Hi, ,Yn\Hn € T> and G R, the random numberfrandom gain) 

n 

1 

is such that minG\^^maxG\^^ < 0, where Hq = Hi V V and minG\^^, 

maxG\jj^ are the minimum and the maximun, respectively, of the components 
associated to the atoms < Hq in the vector G. 

The introduction of the random gain G (which has an its own interpretation 
in a betting scheme) is needed to introduce ’’interaction” among conditional 
random variables. 

When the domain T> has particular algebraic properties, the coherence is 
guaranteed by numerical conditions. One of these situation is: 

Definition 2. Let y be the ring of all the random variables (including the con- 
stants and the indicator functions) with support on the finite algebra A. Then 
a real function P on y\{A \ {<()}) is a full conditional prevision iff the follow- 
ing four properties hold (any time P appears its argument lies in 3^|(Al\ {4>})): 

i) IfY\H > 0 then P(y|i?) > 0 (positivity) 

ii) P(-|i/)zs a linear map (linearity) 

Hi) F{H\H) = 1 (normality) 

iv) ¥{YH\K) = ¥{H\K)F{Y\HK) (multiplicative property) 

and in [10] it is shown that a full conditional prevision is coherent. 

Anyhow, the relationship between coherent and full conditional prevision goes 
further, and in particular we have the following fundamental theorem (Th. 4.4 
in [10], pag. 453): 

Theorem 1. Let V = {YijiJi, , A=<V > and y be the ring of all 

the random variables (including the constants and the indicator functions) with 
support on the finite algebra A, thenF = {P(yiji/i), . . . ,P(F„jiJ„)} is a coherent 
prevision iff there exists a full conditional prevision P', defined on 3^](Al\ {4>}), 
extension o/P. 

This principle is a direct consequence of the ’’Extension Property” and of the 
so called ’’Equivalence Principle” whose proofs in [10] are based on the betting 
scheme, involving the random gain G. 

When, in particular, the domain T> is actually a set of conditional events 
£ = {Ei\Hi, . . . ,En\Hn} (i.e. the possible values for are only in {0,1}), 
instead of a general prevision we talk about a conditional probability assessment 
P : f — >■ [0, 1]. In this situation the Coherence Principle and the ’’fullness” are 
obtained by replacing in Def.l and 2 the word ’’prevision” with ’’probability” and 
’’the ring y” with ’’the Boolean algebra A”. Obviously we have that the existence 
of a full extension P' on A](Al\ |^}) of a conditional probability assessment P 
on Ip is a necessary and sufficient condition for the coherence of P. 
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Anyhow, for conditional probabilities assessment, in [3,5,7] a different (and 
more operative) kind of characterization of coherence has been introduced and 
widely adopted. We report one of their main results with an associated remark 
helpful for us in the sequel: 

Theorem 2. Let P : S = {Ei\Hi, . . . , ^ [0, 1] he a numerical assess- 

ment. The following propositions are equivalent: 

— P is a coherent conditional probability 

— there exists at least one finite class of unconditional probabilities 
{PqjPi, . . . Pk} each probability Pa being defined on a suitable subset Aa C 
As, such that for all Ei\Hi € £ there exists a unique Pa, with 



and 



^ Pa{Ar) > 0 

Ar<Hi 



P{E,\Hi) 



E 

Ar<EiHi 

E 

Ar<Hi 



Remark 1. The probabilities Pa{Ar) are precisely the solutions of the systems 



E Xr=Pi^ \iPa-i{Hi) = Q 

Ar<EiHi Ar<Hi 



Sa={ 



E<>o 



> 0 



( 1 ) 



where Pi = P{Ei\P[i) and the x“’s are the unknowns associated to Pa{Ar), with 
Ar atom of A\g^ for a = 0 and of Aa =< {Hi : Pa-\{Hi) = 0} >|„a , for a > 1. 

Moreover, the restriction of the subalgebra to the atoms contained in Hq, for 
a > 1, can be avoided taking subsets constituted by all the atoms Ar such 
that Pa-i{Ar) = 0 (not only those contained in the relevant Hfs), extending 
each Pa on as 

p* / Ea(.Ar^ if Ar G Aa 

“ “ 1 0 A ArGA*a\Aa 

If the set is not empty, we can take an arbitrary probability distri- 

bution on these ’’remaining ” atoms. 

The operational lack of Theorem 1 is that it is just an existence theorem, while 
the powerfulness of the Characterization Theorem 2 is in the reduction of the 
problem of checking the coherence to a sequence of linear systems, whose solu- 
tions allow to build the necessary full conditional distribution. Note that, gener- 
ally, the solution of one single system is not sufficient (see the examples reported 




An Operational View of Coherent Conditional Previsions 



137 



in [3]) to ensure the coherence of P, in opposition with the definition where 
there is a single random gain G. Moreover, the possible presence of different 
’’zero layers” (the a’s) not only allows to deal with really general conditional 
assessments, but can also be skillfully used to reduce the computational com- 
plexity in implementations for automatic procedures, as already stated in [5,7] 
and later developed in [1,2]. Such a reduction is possible because it is not neces- 
sary to build all the atoms of the algebra A but only those that, at each layer, 
are really needed. 



3 From Conditional Probabilities to Conditional 
Previsions 



If the results about conditional probability assessments are nowadays well es- 
tablished, it is not the same for conditional prevision assessments. So, with the 
’’guideline” of Theorem 2 and Remark 1, we propose the following characteriza- 
tion theorem: 



Theorem 3. Let T> = {Yi\Hi , . . . , Yn\Hn} be a finite set of conditional finite 
random variables. Then, the following assertions are equivalent: 

a) ¥ = {P(FijiJi), . . . ,P(F„ji7„)} is a coherent prevision 

b) there exists (at least) a solution to a sequence of linear systems 
So, Si,..., Sk+i of the form 



Y, y^rx<( = nY^m Y 

Ar<Hi Ar<Hi 



if Y ^ = ^ 

Ar<Hi 



Sc={ 



Y >0 

ArGAc 



Y 2;“ = 0 



a = 0, . . . , k 



> 0 



and 



f E 

ArGAk + 1 






= 1 



Sk+i = { >0 € A+i 






(2) 



(3) 



^ X>;+^ = 0 

where the xf ’s are the unknowns associated to each € A, the yir ’s are the 

values taken by the r.v. Yi ’s on the A^ ’s, the sub-algebras Aa Q A, a = 0, . . . , k, 
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are identified by Aa —< {YiHi : Yl,Ar<Hi ^ = 0} >|^£, with ^ set to 0 for 
all r = 1, . . . , ffA (so that = Y\.\h ) while Ak+i = {A^ G A : xf = 0 Va = 

Before we illustrate the proof, note that each solution of 5 q, a = 0, . . . , A:, 
represented by a vector X“ whose components are the a:“’s, can be seen as a 
’’mass” distribution on A with distinct constraints: in Aa the assessed prevision 
must satisfy the multiplicative property while on ^ \ Aa the mass is forced to 
be 0. On the contrary, the last system Sk+i forces us to choose any probability 
mass function with support Ak+i (when it is not empty) and it is needed 

to ensure each Ar G A to have associated a positive mass in some layer 
a G {0, . . . , fc + 1}. Obviously, whenever Ak+i turns out to be empty, Sk+i 
reduces to a vanishing system. 

In the theorem we have chosen to explicitly denote, by the summations, all 
the elements involved in the constraints, on the contrary for its proof we will 
adopt the ’’lighter” vector notation. 

Proof, (of Th.3) 

First we prove the implication b) ^ a). 

By the class of solutions {X°, . . . , X^+^} of the linear systems we can ’’directly” 
build a full conditional prevision P' on 3^|(^ \ {(/i}), extension of P. In fact, for 
any Y\H G 3^|(^ \ {<()}) there is at least a X“ such that H ■ X“ > 0. Let I be 
the minimum of such indexes and define 

,, , , YH-X^ 

r{Y\H) = (4) 



then it is easy to prove that P' is ’’full” (i.e. it satisfies conditions i)-iv) in 
Def.(2)): 

— it Y\H > 0 then, by the non-negativity of the solutions X“, the scalar 
product YH -X^ is non-negative and so the fraction (4), proving i); 

— ii) directly follows by linearity of the scalar product 



V{XY + ^iZ\H) = 



{XY + fiZ)H -X' ^YH-X^ ZH- X' 
iTX^ H-X^ H-X‘ 



= Ar(F|iL) -h fir{Z\H) 



— condition iii) is trivially satisfied because we will have the same numerator 
and denominator in (4); 

— for the multiplicative property iv) we have to distinguish two cases: 

1. if the same index I is associated to K and HK (i.e. K ■ X^ >0 and 
HK • X* > 0, while K ■ X“ = 0 and HK ■ X“ = 0 for all a < 1) then 



¥'(YH\K) 



YHK ■ X' 
K-X^ 



HK ■ X' YHK ■ X' 
HK • X' X • X' 



HK-X^ YHK-X^ 



¥'{H\K)¥'(Y\HK) 



K ■ X' HK ■ X' 
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2. if the two indexes, I associated to K and I' associated to HK, are different 
then we will have I' > I (because HK < K) and, in particular, HK -X* = 
0. This last equality implies 

^{yh\k) = ^^^^ = ^ , r(if|x) = "^ = o 

and hence property iv) is trivially satisfied as 0 = 0. 

Hence P' is a full conditional prevision which is sufficient, by Theorem 1, to 
guarantee the coherence of P. 

For the opposite implication a) b) we have, again by Th.l, the existence 
of a full extension P'. Note that P' restricted to -4|(Al\ {(j)}) is a full conditional 
probability P' (i.e. properties i)-iv) in Def.2 are satisfied as probabilities). By the 
remark about characterization Theorem 2, there exists a class of probabilities 
{Pq , Pj", ...} and an ’’arbitrary” probability distribution P° with the property 
that, for any Hi G Hq, there is a unique index a such that E K(Ar)>0. 

Ar.<Hi 

We denote now by X“ the vectors with components xf = P*{Ar) if Ar € Al* 
and xf: = Qii Ar G A\ Al* , for a = 0, . . . , A:, while = P^{Ar) if Ar G 
and Xr~^^ = 0 if Hr G ^ \ Fix a conditional r.v. Yi\Hi G V, with the 

previous choice for X“ we have that, for any a G {0, . . . , k}, if Hi ■ X“ = 0 then 
the equation in Sa associated to Yi\Hi is trivially satisfied as 0 = ¥{Yi\Hi)0, 
otherwise, by the linearity property ii), it results (remember that P' extends P) 



P(F,|P,) = r{Y,\H,) = P'(( E VirAr)\H,) = E y^rV'{Ar\H,) = 



Ar<Hi 



Ar<Hi 



= E VirP'iArm = E 



Ar<Hi 



= E 

Ar<Hi 



Ar<Hi 



'P, -X° 



= ( 5 ) 



H, ■ X° 



and hence the equation in Sa associated to Yi\Hi is fulfilled. 
Finally, by definition, it holds 

f2-X'=+i = 1 



hence the sequence |X°, . . . ,X^+^| represents a solution of linear systems like 

{5o,...,5fc+i}. □ 

Note that the solution (if it exists) of the system Sa could be not unique, 
so that the sub-algebra Aa+i could be not uniquely determined. Anyhow, the 
result does not depend on this choice. In fact, if a system is not compatible, 
it means that the restriction of the assessment P to {Yi\Hi : Hi G Aa+i} is not 
coherent, then also the whole assessment P is not coherent. 

As already stressed for conditional probabilities, by the reduction of the prob- 
lem to sequences of linear systems, it is possible to directly implement decision- 
aid systems. And also in this case we could profit from a skillful use of the layers 
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to reduce the spatial complexity of such automatic procedures, but this will be 
deferred to a future work. 

Once the coherence of P is ensured, the decision maker would like to find 
which are the coherent bounds for the prevision on a ’’new” conditional r.v. 
Y\H . In this paper, for the sake of simplicity, we allow only r.v. Y with support 
belonging to A and conditioning event H G (^\{^}) (i.e. Y\H must be logically 
dependent on V), but the results could be generalized to any arbitrary discrete 
conditional random variable. 

From the proof of Theorem 3 we can state that the coherence of P is equivalent 
to the existence of a set V of conditional probabilities on A\{A \ {(j)}) such that 
F{Yi\Hi) coincides with the conditional expected value of Yi\Hi with respect to 
any P gV, for all Yi\Hi G V, in formulae 

Y H X“ 

F{Y,\H,) = Ep{Y,\H,) = ^ \ 'iPGV 'iY,\H, G V. 

iii ' -A. 

with X“ vector of components P*{Ar), r = 1,...,#(I2) and a such that 
Hi ■ X“ > 0. It follows that coherent bounds for F(Y\H) will be 

p; = inf Ep{Y\H) p“ = sup Ep{Y\H). (6) 

Pev p^v 

Anyhow, reasoning exactly as done in [7] for the extension of coherent 
conditional probabilities, it is possible to show that p; and p“ can be com- 
puted, not with respect to the whole set V, but just with respect to the 
conditional probability P^ G V such that the number of relevant constraints 
Epe{YiHi) = F{Yi\Hi) P^{Hi) is minimal (note that we judge such constraints 
’’relevant” when they are not trivially satisfied as 0 = 0, equivalently when Hi 
and H belongs to the same layer). 

This is operationally reachable by choosing solutions X“ for the Sa such 
that, till possible, they respect the further constraint H ■ X“ = 0 and only when 
at a layer a this is not possible any more (note that by the definition of Sk+i 
such a < k + 1 exists) the bounds in (6) can be computed by the two linear 
programming problems 

p; = minYH ■ X“ p“ = maxY H ■ X“ under constraints {5 q} U {H ■ X“ = 1} 

(7) 

Any value inside the range [p;,p“] chosen for P(F|iJ) is guaranteed to be 
coherent together with the initial assessment P and can guide the decision pro- 
cess. 

4 A Simple Example 

We present now a simple example with the purpose to illustrate all the the- 
oretical steps already introduced. The example is ’’artificially” simplified for 
readability reasons. Obviously real practical applications would better show the 
potentiality of the procedure, on the other hand it would be much more difficult 
to analytically describe them. 
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Example 1 . A farmer must decide if it will be more convenient to grow artichoke 
or fennel. His profit will strongly depend on the weather in the next winter 
season: if the season will be ’’cold” (e.g. with minimal temperatures always below 
5°C), then he will more likely get a good profit growing fennel. If the weather 
will be milder (e.g. with minimal temperatures always over 0°C), then more 
likely the prices for artichokes will exceed those of fennel. Anyhow, inside the 
different kinds of season a variability on the profit is possible. 

We can formalize this situation as the following: 
the different (but ’’overlapping”) kinds of season are represented by the events 
Hi =”mild weather” and H 2 =”cold weather”, while the different plantations 
are Yi =” artichokes” I2 =” fennel”. 

The pair (Yi, I2) represents a random vector whose support we reduce to five 
discrete values (expressed in ’’thousands of euros/hectare”): 

(Yi,Y 2 ) G {(1,60); (10,45); (17, 15); (20,5); (30,3)} 

Note that the single r.v. Yi and I2 are not independent but strongly negative 
correlated. 

A is the algebra generated by the values taken by (li,y2) and 
Ai, A 2 , A 3 , A 4 , A 5 are the atoms. 

In this way we can represent the different kinds of season as elements of A, 
in particular Hi = Ai V A 2 V A3 and H 2 = A 2 V A 3 V A 4 V A 3 . 

The farmer is not used to deal with probability but with his experience he 
can assess the following previsions: 

P(YijHi) = 24 P(F2|i7i) =4.2 P(Fi|i72) = 2.8 P(F2|i72) = 57 

The support we can offer to the farmer before he takes any practical decision 
is, first of all, to check if his assessment is in some way ’’consistent”, or if his 
strong non-monotone behaviour (legitimated by the negative correlation) is not 
justifiable with such an extent. 

The answer is affirmative because, applying all the theoretical machinery we 
have described before, we obtain solutions for the following sequence of three 
linear systems: 

' 10x2 + -I- 20 x 4 + 30x5 = 24(x 2 -I- Xg -I- X4 -I- X5) 

45X2 + 15Xg -I- 5X4 -I- 3Xg = 4.2(X° -I- Xg -I- X° -I- X5) 

„ _ X4 -I- 10x2 + 1’^3;3 = 2.8 (x 4 -I- x° -I- x°) 

° 6OX4 -I- 45x2 + 15xg = 57 (x5 -I- X 2 -f X3) 

X5 -I- X° -I- X3 -I- X4 -I- Xg > 0 

^x°>0 z=l,...,5 

The only normalized solution of this system is: 

X4 = |, Xg = I X4 = x° = Xg = 0 then P{H 2 ) > 0 and P{Hi) = 0 

{ x'l + 10x2 + = 2.8(x{ -I- X2 -I- Xg) 

60x{ -I- 45x2 + 15xg = 57(x{ -I- X2 -I- Xg) 
x'l -I- X'2 -I- Xg > 0 
X4 -I- Xg = 0 
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The only normalized solution of this system is: 
x{ = ^,X 2 = ^ x'^ = x '4 = x'^ = 0 then P{Hi) > 0 

r x'l = 4' = x'i = x'l = 0 

The farmer receives afterwards new information regarding the weather. Ac- 
cording to the weather-forecast, the forthcoming winter will be neither too cold, 
nor too mild. For this reason he has to choose between the two new conditional 
r.v. '.Yi\H 1=1,2 with H = HiH 2 - We can apply what we have stated for the 
extension of P and, since solutions {X°,X',X"} found are unique apart from a 
scale factor, the ’’highest” layer reachable by H is in 5i. Hence coherent values 
for the extension in this case are uniquely determined as 

P(Fi \H) = = 2/12 = 10 and P(F 2 |i? ) = = 2/22 = 45 

X 2 X 2 

so that the farmer opts to grow fennel. 

Note that in this example the different layers are strictly needed, because 
the first linear system Sq do not admit a solution that gives to both hypotheses 
positive probability. Hence, whenever one stops to look at the first system (as 
usually done), the assessment could seem inconsistent. Moreover, even if the 
different kinds of season could be modeled as fuzzy entities, in our approach 
they are “crisply” identified trough the different synthesized values of the random 
variables. In this way we avoid the crucial point of the choice of the membership 
functions, together with the robustness study that should be performed. 



5 Conclusions 

With this paper we have extended the fruitful results [3, 4, 5, 6, 7] of character- 
ization of coherent conditional probabilities, and inferences based on them, to 
coherent conditional previsions. This operational results enlarge the applicability 
of partial assessments guided by coherence. In fact, conditional previsions allow 
to fully synthesize the behaviours on relevant quantities, contingently on different 
scenarios. Such syntheses can be easily performed and interpreted, and do not 
require to the user a deep probabilistic knowledge. Note that, even the decision 
maker does not express a probabilistic evaluation on the hypotheses Hi, , Hn, 
his conditional previsions ¥{Yi\Hi) implicitly express his belief on the occurency 
of the different Hi’s (this belief is obtainable by the set of admissible solutions 
X° . . . ,X^+^ of the systems 5 q’s). 

With our results we give an operational tool to fruitfully use the generality 
and potentiality of partial conditional assessments, showing also in this case that 
in conditional frameworks it is absolutely needed to introduce different levels of 
interaction (represented by the layers a) among the quantities involved. 
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Abstract. When solving a decision problem we want to determine an 
optimal policy for the decision variables of interest. A policy for a deci- 
sion variable is in principle a function over its past. However, some of the 
past may be irrelevant and for both communicational as well as compu- 
tational reasons it is important not to deal with redundant variables in 
the policies. In this paper we present a method to decompose a decision 
problem into a collection of smaller sub-problems s.t. a solution (with 
no redundant variables) to the original decision problem can be found 
by solving the sub-problems independently. The method is based on an 
operational characterization of the future variables being relevant for a 
decision variable, thereby also providing a characterization of those parts 
of a decision problem that are relevant for a particular decision. 



1 Introduction 

Influence diagrams (IDs) were introduced in [3] and serve as a powerful model- 
ing tool for symmetric decision problems with a single decision maker. The ID 
supplies a natural representation for capturing the semantics of decision mak- 
ing “with a minimum of clutter and confusion for the non-quantitative decision 
maker” [12]. 

Given an ID representation of a decision problem, we can use the explicit 
structural information for analyzing relevance. For example, when analyzing the 
ID depicted in Figure 1 we find that knowledge of the states of Di, A, D2 and 
I?3 does not improve decision D4 as long as the states of B and C are known. 
Hence, the variables Di, A, D2 and D3 are not required for D4, and having 
identified these irrelevant variables the domain of the policy for D4 is reduced 
considerably; different operational characterizations of the required past for a 
decision variable have been found independently in [10,6]. 

Analogously to the possible occurrence of redundant variables in the past of 
a decision, there may also exist future variables which are irrelevant. By also 
identifying these variables, we obtain a complete specification of the parts of a 
decision problem that are relevant for a particular decision. 

The advantages of being able to characterize the relevant parts of a decision 
problem are twofold. First of all, if some decision variable is of particular impor- 
tance, then we can focus our attention on the parts relevant for that decision 
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Fig. 1. The figure depicts an ID with four chance nodes A, B, C and E, four decision 
nodes Di, D 2 , D3, D4 and three value nodes; the total utility is the sum Vi + V2 + V3. 



variable when specifying the quantitative part (probabilities and utilities) of the 
model. Similarly, when using the ID as a tool for communication we can answer 
questions as to whether a variable X provides information which is relevant when 
deciding on a decision D. Being able to answer such questions is also useful in the 
context of information filtering, i.e., the problem of adjusting the configuration 
and quantity of information displayed to a decision maker (see e.g. [2]). 

The second advantage has to do with the evaluation. When referring to the 
evaluation of an ID we usually mean computing an optimal policy for all decisions 
involved. This interpretation can, however, be slightly misleading as we might 
only be interested in a subset of the decisions; actually, in most cases we are 
only interested in the initial decision as this decision can be seen as generating a 
new decision problem. When only a subset of the decisions is of interest the ID 
may contain superfluous information, i.e., variables which has no impact on the 
solution for the decisions in question. By removing such irrelevant variables we 
can reduce the size of the problem and, thereby, the computational complexity 
of solving the decision problem. 

In this paper we propose a method to decompose an ID into a collection 
of smaller IDs. The decomposition produces an ID for each decision variable 
of interest, and each of these IDs contains exactly the variables necessary and 
sufficient for determining an optimal policy for the associated decision variable; 
hence, the IDs can be solved independently of each other. The decomposition 
does not place any restrictions on the subsequent evaluation so we can use any 
evaluation scheme that we find suitable (see e.g. [11,13,4,5]). Additionally, since 
the IDs only include variables that are relevant for the associated decision vari- 
ables, we are always given policies with no redundant variables. Note, that the 
methods cited above do not ensure this property themselves, i.e., solving an 
ordinary ID is not guaranteed to produce optimal policies with no redundant 
variables. 
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In Section 2 we formally introduce the ID and the terms and notation used 
throughout the paper. In Section 3 we give an operational characterization of the 
variables being relevant for a given decision variable. Based on this characteri- 
zation we show how an influence diagram can be decomposed into a collection 
of smaller influence diagrams. Finally, the proposed method is illustrated by 
decomposing the influence diagram in Figure 1. 



2 Preliminaries 

An ID can be seen as a Bayesian network augmented with decision nodes and 
value nodes. Thus, an ID is a directed acyclic graph G = {U, £), where the nodes 
lA can be partitioned into three disjoint subsets; chance nodes, decision nodes 
and value nodes. 

The chance nodes lAc correspond to chance variables, and represent events 
which are not under the direct control of the decision maker. The decision nodes 
Ud correspond to decision variables and represent actions that are under the 
direct control of the decision maker. In the remainder of this paper we assume a 
total ordering of the decision nodes, indicating the order in which the decisions 
are made (the ordering of the decision nodes is traditionally represented by a 
directed path which includes all decision nodes.) Furthermore, we will use the 
concept of node and variable interchangeably if this does not introduce any 
inconsistency. We will also assume that no barren nodes are specified by the ID 
since they have no impact on the decisions (see [11]); a chance node or a decision 
node is said to be barren if it has no children, or if all its descendants are barren. 

With each chance variable and decision variable X we associate a state space 
sp{X) which denotes the set of possible outcomes/decision options for X. For 
a set W of chance variables and decision variables we define the state space as 
sp{U') = x{sp{X)\X €U'}. 

The uncertainty associated with a chance variable C is represented by a 
conditional probability potential P(C\ttc) '■ sp{{C} U ttc) — >■ [0; 1], where ttc 
denotes the parents of C in the ID. The domain of a conditional probability 
potential 4>C = P{C\t^c) is denoted by dom((/>c) = {C} U ttc. 

The set of value nodes Uv defines a set of utility potentials; value nodes 
have no descendants. Each utility potential indicates the local utility for a given 
configuration of the variables in its domain. The domain of a utility potential 
tpy is denoted by dom('0y) = Try, where V is the value node associated with ipy. 
The total utility is the sum or the product of the local utilities (see [14]); in the 
remainder of this paper we assume that the total utility is the sum of the local 
utilities. Analogously to the concepts of variable and node, we shall sometimes 
use the terms value node and utility potential interchangeably. 

A realization of an ID I is an attachment of potentials to the appropriate 
variables in I, i.e., a realization is a set {P{C\ttc) '■ C £ lAc} U {i/^y(7ry) : V £ 
lAy}. Hence, a realization specifies the quantitative part of the model, whereas 
the ID constitutes the qualitative part; we will sometimes use Ajf to denote the 
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potential associated with the variable X if it is of no importance whether Ax is 
a utility potential or a probability potential. 

The arcs in an ID can be partitioned into three disjoint subsets, corresponding 
to the type of node they go into. Arcs into value nodes represent functional 
dependencies by indicating the domain of the associated utility potential. Arcs 
into chance nodes, termed dependency arcs, represent probabilistic dependencies, 
whereas arcs into decision nodes, termed informational arcs, imply information 
precedence; if there is an arc from a node A to a decision node D then the state 
of X is known when decision D is made. 

Let Uc be the set of chance variables and let Wd = {Di, D 2 , . . . , D„} be the 
set of decision variables. Assuming that the decision variables are ordered by in- 
dex, the set of informational arcs induces a partitioning of Uc into a collection of 
disjoint subsets Cq,C\, . . . ,Cn- The set Cj denotes the chance variables observed 
between decision Dj and Dj^\. Thus the variables in Cj occur as parents of Dj+i. 
This induces a partial order -< on Uc ^Ud, i.e., Cq ^ Di ^ Ci ^ ^ ^ C„. 

The set of variables known to the decision maker when deciding on Dj is 
called the informational predecessors of Dj and is denoted pred(Dj). By assum- 
ing that the decision maker remembers all previous observations and decisions, 
we have pred(Z?i) C pred(Dj) for i < j, and pred(Dj) is the variables that occur 
before Dj under This property is known as no-forgetting and from this we can 
assume that an ID does not contain any no-forgetting arcs, i.e., TTOt T^Dj = 0 
if D,^D^. 

The state configuration of pred(Dj) observed before deciding on Dj induces 
a set of independence relations on the variables in U. These relations can be 
determined using the well-known d-separation criteria: 

Definition 1 Let G be a directed acyclic graph. If X, y and Z are disjoint 
subsets of the nodes in G, then X and Z are said to be d-separated given y if 
each path between a node in X and a node in Z contains a node Y s.t.: 

— A is an intermediate node in a converging connection (head-to-head), and 
neither Y nor any of its descendants are in y. 

— y is an intermediate node in a serial or a diverging connection (head-to-tail 
or tail-to-tail), and Y & y. 

If X and Z are not d-separated given y, then we say that X and Z are d- 
connected given y, and the paths connecting X and Z are called active. 

[1] and [9] present efficient algorithms for detecting d-separation directly from 
the topology of the graph by examining the paths connecting X , y and Z. 



2.1 Evaluation 

Solving an ID amounts to computing a policy for the decisions involved. A policy 
can be seen as a prescription of responses to earlier observations and decisions, 
and the set of policies for all the decision variables constitutes a strategy for 
the ID. The evaluation is usually performed according to the maximum expected 
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utility principle, which states that we should always choose an alternative that 
maximizes the expected utility. 

Definition 2 Let / be an ID and let 14 d denote the decision variables in I. A 
strategy is a set of functions A = {5d\D € Ud}, where 6d is a policy given by: 

6d : sp{pred{D)) — >• sp{D) . 

A strategy that maximizes the expected utility is termed an optimal strategy, 
and each policy in an optimal strategy is termed an optimal policy. 

The optimal policy for decision variable is given by* 

(5D„(pred(L>„)) = argrnax^P(C„|Co,L>i,... ^ i)v 

” C„ V&Av 

and the maximum expected utility potential for decision is 

/9_D„(pred(L>„)) = inax^P(C„|Co,Di,... ^ ipy ■ 

” C„ veUv 

In general, the optimal policy for decision Dk is given by 

SDk{pred{Dk)) = a,rgmax'^P{Ck\Co,Di,... ,Ck-i,Dk)pDk+i , ( 1 ) 

where pok+i is the maximum expected utility potential for decision Dk+i'- 

PDfe+i(pred(i:)fc+i)) = max V P{Ck+i\Co,Di, . . . ,Ck,Dk+i)PDu +2 ■ 

-^fc + 1 ^ 

t^fe + 1 

By continuously expanding Equation 1, we get the following expression for the 
optimal policy for 

<5Dj^(pred(L>fe)) = argmax E •••inax^ 

c. C„ 

P{Ck, ■ ■ ■ ,Cn\C[j, . ■ . ,Ck-l,Di, . . . ,Dn) Ipv ■ (2) 

veUv 

Not all the variables observed (i.e. pred(Dfc)) do necessarily influence the optimal 
policy for D^, hence we introduce the notion of a required variable: 

Definition 3 Let / be an ID and let D be a decision variable in I. The variable 
X G pred(D) is said to be required for D if there exists a realization of I and a 
configuration y over dom(<5£))\{A} s.t. 5D{xi,y) yf dD{x 2 ,y), where xi and X 2 
are different states of X. The set of variables required for D is denoted req(D). 

* For the sake of simplifying notation, we shall assume that for all decision variables 
Di there is always exactly one element in argmaxDj(-)- 
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Finally, when calculating an optimal strategy for an ID I, the variables in 
Co = pred(I?i) are never marginalized out; marginalizing out a chance variable 
corresponds to a summation over its state space. This allows us to define a 
partial realization TZ oi I as & realization which does not necessarily include the 
potential Ax if there does not exist a variable Y G dom(Ax) s.t. Cq ^ Y. We 
allow a (partial) realization TZ to be extended with a set TZ' of potentials if there 
does not exist a potential in TZ' with the same domain as a potential in TZ. That 
is, the extension is admissible if no variable is associated with more than one 
potential w.r.t. TZ U TZ' . 



3 Decomposition of Influence Diagrams 

In this section we describe a method for decomposing an ID / into a collection 
of smaller IDs s.t. an optimal strategy for / can be found by solving the smaller 
IDs independently. More precisely, the decomposition produces an ID for each 
decision variable D in I s.t. an optimal policy for D can be found by solving the 
associated ID independently of the other IDs produced by the decomposition. 

Decomposing an ID I is essentially a question of identifying the potentials 
involved in the computation of 6oi, for all 1 < i < n and for any realization of 
I (see Equation 2). These potentials are uniquely identified by their associated 
variables. Thus, apart from the required variables we also need to identify the 
future variables that are relevant for the decisions in the ID, i.e., the variables 
that may influence the optimal policy for a decision variable: 

Definition 4 Let / be an ID and let I? be a decision variable in I. The variable 
X{D X) is said to be relevant for D if either: 

— AT G Uc U Uv and there exists two realizations TZ\ and TZ 2 of / who only 
differ on the potential associated with X s.t. the optimal policies for D are 
different in TZ\ and 7^2 j or 

~ X & Ud and there exists a realization of / and two different policies 6x and 
Sx for X s.t. the optimal policies for D are different w.r.t. 6x and Sx- 

The set of variables relevant for D is denoted rel(D). 

The variables req(D) Urel(I?) determine the potentials involved in the computa- 
tion of Sd- Thus, we seek a decomposition scheme which produces a collection of 
IDs s.t. each of the IDs contains exactly the relevant and required variables for 
the associated decision and, furthermore, obeys the decision sequencing specified 
by the original ID. Note that the latter property may require the inclusion of 
informational arcs which are not part of the original ID. 

Definition 5 Let / be an ID with nodes 14^ and arcs , and let I? be a decision 
variable in I . The ID Id is said to reflect the decision problem D in I if: 

— = rel(D) U req(D) U {D}, and 
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- = {{X,Y)\X,Y and (X,Y) € S^}\J 

{(X, Y)\X G Y G where X Y and X Y} . 

Definition 3 and Definition 4 give semantic characterizations of both the 
variables being required and relevant for a particular decision variable. Different 
syntactical characterizations of the required variables have been found indepen- 
dently in [10,6]. In this paper we adopt the algorithm from [10], which is based 
on the notion of a chance variable representation of a policy (see [8] for further 
discussion on chance variable representations): 

Definition 6 Let D be a decision variable and let (5i)(req(Zl)) be an optimal 
policy for D. A chance variable D' with req(D) as parents and with the potential 
P{D'\req{D)) defined as: 



P{d\f) 



1 if Soir) = d 
0 otherwise 



is said to be the chance variable representation ofSo- 

The algorithm proposed in [10] works by visiting each of the decision variables 
in reverse temporal order. When a decision variable is visited the variables being 
required for that decision are identified, and the decision variable is replaced by 
its chance variable representation. 



Algorithm 1 Let I be an ID and let Di, D 2 , ■ ■ ■ , be the decision variables 
in / ordered by index. To determine the variables required for Di (VI < i < n) 
do: 

1) Set i := n. 

2) For each decision variable not considered {i > 0) 

a) Let Vi be the set of value nodes to which there exists a directed path 
from Di in I. 

b) Let Doi be a set of nodes s.t. X G Doi if and only if X is d-connected 
to a node in Vi given {Di} U pred(Di)\{A}. 

c) Set req(Di) := T>Di H pred(Di) (req(Di) is the set of variables required 
for Di). 

d) Replace Di with a chance variable representation of the policy for Di. 

e) Set i := i — 1. 

The correctness of the algorithm is not proven in [10]. A proof can be found 
in [7], where it is also shown that the value nodes relevant for a decision variable 
Di are exactly the value nodes Vi found by the algorithm above. 

Example 1 Consider the ID depicted in Figure 1. By applying Algorithm 1 to 
this ID we start off by visiting decision D4. As pred(D4) = {Z?i, A, D2, R, D3, C} 
we find that only B and C are required for D4; all other nodes X G pred(L>4) 
are d-separated from V4 = {V 2 } given {D 4 } U pred(D4)\{A}. Note that from 
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-V (b) 



Fig. 2. The figures depict the IDs obtained from the ID in Figure 1, by applying the 
first two iterations of Algorithm 1. The dashed nodes in Figure (a) and (b) are the 
nodes required for D 4 and D 3 , respectively. 



Algorithm 1 (see [7]) we can also infer that V2 is the only value node relevant for 
D 4 . Decision D 4 is then replaced by its chance variable representation D'^ having 
B and C as parents (see Figure 2a), and the algorithm proceeds to decision D3. 

Since V3 = {V3} in the ID produced by the previous step (see Figure 2a), we 
find that D 2 and B are the only variables required for D^. Decision is then 
replaced by its chance variable representation (see Figure 2b), and the algorithm 
proceeds as above until all decision variables have been investigated. □ 

From Algorithm 1 we have a syntactical characterization of both the variables 
being required for a given decision variable D and the set of value nodes relevant 
for D. Moreover, from Definition 4 it is apparent that a decision variable D'{D < 
D') is relevant for D if and only if there exist a utility function xp relevant for 
both D and D' (see also [6]). Thus, we have narrowed our search for relevant 
variables down to the chance variables succeeding D under 

Theorem 1 Let I be an ID and let Di be a decision variable in I . Let U denote 
the ID obtained from / by replacing Dj with its chance variable representation, 
for j = t + I, . . . , n. Then the chance variable X{Di -< X) is relevant for Di if 
and only if 

— A is not barren in the ID formed from A by removing all value nodes that 
are not relevant for Di, and 

— there exists a utility potential ipy relevant for Di s.t. X is d-connected to V 
in li given { A} U pred(A)- 

A proof of Theorem 1 is given in the appendix. Now, having found a syntac- 
tical characterization of the variables being required and relevant for a decision 
variable D, we have the following theorem which shows that performing a de- 
composition according to Definition 5 is sound w.r.t. the optimal policies. 

Theorem 2 Let I be an ID and let 77. be a realization of /. If A reflects the 
decision problem D in I, then 77 is a (partial) realization of A and Sjf = 5\). 
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Proof. From the construction of /d we only need to show that for any realization 
of I, if Ax is relevant for the computation of then {X} U ttx ^ 14 ^^ . That 
is, Ax is contained in the (partial) realization of Id- Now, consider X >- D 
and assume that Ax is a probability potential. Since Ax is relevant for the 
computation of we have that X G Ui^ (see Theorem 1 ). Moreover, from 
the properties of d-separation it follows that for any Y G ttx, M G req(D) if 
Y G pred(Zl) and Y G rel(I?) if Y ^ pred(I?). On the other hand, if X is 
required for D then there exists a variable Y G ttx which is relevant for D] if 
this was not the case Ax would not be relevant for the calculation of 5 d- Thus, 
the variables in ttx are either required or relevant for D. 

Finally, if X is a value node the proof follows immediately. □ 



Example 2 Consider the ID depicted in Figure 1 . From Algorithm 1 we find 
that V2 is the only value node relevant for D4, and by Theorem 1 this implies 
that E is relevant for D4. By Algorithm 1 we also have that B and C are 
required for £>4. These nodes define the potentials which are needed to compute 
an optimal policy for and by Definition 5 they constitute the nodes in the 
ID that reflects the decision problem D4 (see Figure 8(1)4)). Note that i) the 
informational arc from B to D4 ensures that the information constraints are 
consistent with the original ID, and ii) B is not associated with a probability 
potential as it is not marginalized out when calculating an optimal policy for D4 
(we work with a partial realization). 




(O2) (Ds) 



Fig. 3. The figure shows a decomposition of the influence diagram depicted in Figure 
1 into four smaller influence diagrams. 



Continuing to decision D3, we start off by replacing £>4 with a chance variable 
£>4 having B and C as parents. From Algorithm 1 we have that B and D2 are 
required for £>3 since V3 is relevant for £>3. These nodes then constitute the 
nodes in the ID that reflects the decision problem D3 (see Figure 3(Z)3)); by 
Theorem 1 no future variables are relevant for D3. The algorithm proceeds as 
above until all decision nodes have been considered, see Figure 8(1)2) and Figure 
8(Di). It should be noted though, that when investigating Di the chance node 
C is d-connected to Vi which is relevant for D\. However, C is not relevant for 









Decomposition of Influence Diagrams 153 



D\ since it is barren in the ID formed by removing the value nodes V2 and V3 
which are not relevant for D\. □ 

In addition to produce policies with no redundant variables, the decomposi- 
tion of an ID may also yield a reduction in the computational complexity when 
calculating an optimal strategy. For instance, the strong junction tree represen- 
tation (see [4]) of the original ID depicted in Figure I contains a clique consisting 
of five variables. On the other hand, the largest clique in the junction trees, for 
the IDs produced by the decomposition, contains only four variables. 

It is easy to verify that the computational complexity of solving one sub- 
problem is not larger than the computational complexity of solving the origi- 
nal decision problem. However, when working with overlapping subproblems we 
may need to repeat calculations previously performed. An immediate approach 
to overcome this problem is to cache/reuse some of the previous computations; 
this is subject to further research. 

4 Conclusion 

In this paper we have presented an operational characterization of the future 
variables being relevant for a decision variable. These variables, together with 
the variables of the required past, describe the parts of a decision problem which 
are necessary and sufficient to consider when calculating an optimal policy for a 
particular decision variable. 

Moreover, based on this characterization we have presented a method for 
decomposing an ID into a collection of smaller IDs. A solution to the original ID 
can then be found by solving these smaller IDs independently. The decomposition 
ensures that no redundant variables are contained in the optimal strategy and, 
furthermore, it may also reduce the computational complexity of solving the 
original decision problem. 

The proposed method may also provide a way to simplify the elicitation of 
the structure of a complex decision problem. For instance, it may be possible 
to characterize conditions that are necessary and sufficient to ensure the consis- 
tency of different subproblems based on the existence of a decision problem that 
decomposes into these subproblems. 

Acknowledgement. The author would like to thank Finn V. Jensen for valu- 
able discussions and helpful comments on earlier versions of this paper. The 
author would also like to thank the anonymous reviewers for constructive com- 
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Appendix: Proofs 

Before giving the proof of Theorem 1 we need the following lemma. 

Lemma 1 Let X and Y be chance variables in a Bayesian network BN , and let 
c be a configuration over a set of variables Z not containing X or Y . If X is not 
barren, then X is d-separated from Y given c if and only if for any two realizations 
differing only on the potential P{X\ttx) it holds that Pi(F|c) = P 2 {Y\c). 

Proof. Consider the Bayesian network BN' obtained from BN by adding an 
artificial binary chance variable W as a parent for X. Let Pi and P 2 be any 
two realizations differing only on the potential P{X\ttx) in BN, and let the 
conditional probability distribution for X in BN' be specified s.t. P'(X\ttx, W = 
0) = Pi{X\ttx) and P'{X\ttx,W = 1) = P 2 {X\ttx). Recall that W and Y are 
d-separated given c if and only if for any realization of BN' the state of W has 
no impact on P'{Y\c), i.e., P'{Y\c,W = 0) = P'{Y\c,W = 1). Thus, W is d- 
separated from Y given c if and only if for any two realizations P± and P 2 of BN, 
differing only on P{X\xx), it holds that Pi(F|c) = P 2 {Y\c) in BN. However, 
from the construction of BN' we have that as X is not barren, W is d-separated 
from Y given c if and only if X is d-separated from Y given c. 

Based on the lemma above, we present the proof of Theorem 1. 

Proof (Theorem 1 ). It is easy to verify that replacing Dj with its chance variable 
policy, for j = i -I- 1, . . . , n, does not change the optimal policy for Di nor does 
it introduce any “false” structural dependencies (see also [10,7]). 

So we can restrict our attention to the last decision variable in /. From 
Equation 2 we have: 

(pred(D„)) = argmax P{Cn\Co, . . . ,Cn-l, Di, . . . , Dn) E i>v ■ 

” c„ veuv 

Without loss of generality we can assume that is the only utility function 
relevant for Dn. Then by the distributive law we get: 

<5£)„(pred(D„)) = arginax^P(C„|Co,... £>i, . . . ,Dn)^fv{'^v) 

” c„ 

= arginax ^ E -P(Cn|Co, • ■ • L>i, . . . , £>„) 

CnPlTTv Cn\7TV 

= argmax E ^/>y(7ry)P(C„ n Tp/jCo, . . . ,Cn-l,Di,... , Dn) ■ 
" c„n^v- 

So, X is relevant for if and only if P{X\ttx) has an effect on P(C„ fl 
TTy |pred(P„), £)„). By Lemma 1 we have that this holds if and only if X is d- 
connected to C„ flTTy given pred(P„) U {Dn}. X is d-connected to C„ flTTy given 
pred(P„) U {Dn} if and only if X is d-connected to V given pred(L>„) U {Dn}, 
thereby completing the proof. □ 
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Abstract. In this paper we propose the use of mixtures of truncated 
exponential (MTE) distributions in hybrid Bayesian networks. We study 
the properties of the MTE distribution and show how exact probability 
propagation can be carried out by means of a local computation algo- 
rithm. One feature of this model is that no restriction is made about the 
order among the variables either discrete or continuous. Computations 
are performed over a representation of probabilistic potentials based on 
probability trees, expanded to allow discrete and continuous variables 
simultaneously. Finally, a Markov chain Monte Carlo algorithm is de- 
scribed with the aim of dealing with complex networks. 

Keywords: Hybrid Bayesian networks, MTE distribution, MTE net- 
works, probability propagation, Markov chain Monte Carlo. 



1 Introduction 

Bayesian networks provide a framework for efficiently dealing with multivariate 
models. One important feature of these networks is that they allow to perform 
probabilistic inference taking advantage of the independence relationships among 
the variables. Probabilistic inference, commonly known as probability propaga- 
tion, consists in obtaining the marginal distribution on some variables of interest 
given that the values of some other variables are known. 

Much attention has been paid to probability propagation in networks where 
the variables are qualitative. Several exact methods have been proposed in the 
literature for this task [2, 7, 8, 13], all of them based on local computation. Local 
computation means to calculate the marginals without actually computing the 
joint distribution, and is described in terms of a message passing scheme over a 
structure called join tree. Also, approximate methods have been developed with 
the aim of dealing with complex networks [1, 10, 12]. 
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In mixed Bayesian networks, where both discrete and continuous variables 
appear simultaneously, it is possible to apply local computation schemes sim- 
ilar to those for discrete variables. However, the correctness of exact inference 
depends on the model. 

The most deeply studied mixed model for which exact local computation is 
correct is based on the conditional Gaussian (CG) distribution [5,6,9]. In this 
model, networks where discrete variables have continuous parents are not al- 
lowed. To avoid this restriction. Roller et al. [3] model the distribution of discrete 
nodes with continuous parents by a mixture of exponentials, but then inference 
is carried out by means of Monte Carlo methods. In a more general setting, one 
way of using local computation is to discretize the continuous variables [4] , and 
then they are treated as if they were quantitative. 

In this paper, we study the use of mixtures of truncated exponentials to 
represent the distribution of the variables in the network. This model does not 
impose any restriction about the interactions among variables, either discrete 
or continuous (discrete nodes with continuous parents are allowed), and exact 
propagation can be performed using local computation algorithms. The main 
utility of this model is to provide an alternative to discretization. Discretization 
can be seen as approximating a density with a mixture of uniforms. We think 
that more accurate approximations can be obtained using exponential shaped 
functions instead of uniforms. 

We introduce the notation used throughout the paper in section 2. The MTE 
model is presented in section 3, and the correctness of exact probability prop- 
agation on this model is shown in section 4; also, a data structure is proposed 
to represent MTE potentials based on probability trees. Section 5 is devoted to 
describe a Markov Chain Monte Carlo (MCMC) algorithm, useful when exact 
propagation is infeasible. The paper ends with conclusions in section 6. 

2 Notation 

A Bayesian network is a directed acyclic graph where each node represents a 
random variable, and the topology of the graph shows the independence rela- 
tions among the variables, according to the d-separation criterion. Given the 
independences encoded by the graph, the joint distribution is determined giving 
a probability distribution for each node conditioned on its parents. 

We will denote random variables by capital letters like X, Y and Z, while 
boldfaced capital letters will stand for multidimensional variables. A multidi- 
mensional variable will be denoted by Y if it is discrete, Z if it is continuous 
or X if its components may be discrete and continuous. If X is a variable, x 
will denote a value of that variable. The set of possible values of a variable X is 
denoted by fix- 

Given x € Ox and X' C X, we denote by x^^^' the element of fix' obtained 
from X by dropping the coordinates corresponding to variables not in X'. 

A potential <j) defined on fix is a mapping ^ : fix JRq , where is the 
set of non-negative real numbers. Probabilistic information (including ‘a priori’. 
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conditional and ‘a posteriori’ distributions) will always be represented by means 
of potentials, as in [7]. If ^ is a potential defined on fix, dam{<f) will denote the 
set of variables for which <j) is defined (i.e. dam{<j)) = X). We will use letter <j) to 
denote a generic potential, while letter / will be used for probability densities. 



3 Mixtures of truncated exponentials 

In this section we introduce a class of mixed distributions, which we will call 
mixture of truncated exponentials. Before defining the distribution itself, we study 
the concept of potential and the basic operations over them. 

Definition 1. (MTE potential) Let ~X. be a mixed n-dimensional random vari- 
able. Let Y = (Yi, . . . ,y)j) and Z = (Zi, . . . , Z^) be the discrete and continuous 
parts of X, respectively, with c -f d = n. We say that a function (f> : fix Rq 
is a potential of class mixture of truncated exponentials (MTE potential) if one 
of the next two conditions holds: 

i. 4> can be written as 



m \ ^ ^ 1 

(f>{x) = 4>{y, z) = ao + ^ Oj exp i ^ ^ bf^’^hu > (1) 

[i=i k^i ] 

for all X e fix, where Oi, i = 0, . . . ,m and b^p , i = 1 , . . . ,m, j = 1 , . . . ,n 

are real numbers. 

ii. There is a partition fi\, . . . , fik of fix verifying that the domain of the con- 
tinuous variables, fiz, is divided into hypercubes, the domain of the discrete 
variables, fi^ , is divided into arbitrary sets, and such that <f is defined as 



^(x) = (f)i{x) if x£ fii , 

where each (pi, i = 1,. . . ,k can be written in the form of equation (1) (i.e. 
each (pi is an MTE potential on fii). 



Example 1. The function (p defined as 



'2 + g3zi+Z2 ^ gZl+Z 2 

1 +gZl + Z2 



<P{Z1,Z2) = < 



+ e 



2zi+Z2 



- + 5e^'+2z2 



if Q < zi <1, Q < Z2 <2 

if 0 < Zi < 1, 2 < Z2 < ^ 

if 1< zi <2, 0 < Z2 <2 

if 1 < zi <2, 2 < Z2 < 3 



is an MTE potential since all of its parts are MTE potentials. 
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3.1 Operations over MTE potentials 

Three basic operations are necessary for performing probabilistic inference over 
a distribution specified as a Bayesian network: restriction, marginalization and 
combination (product). 

Definition 2. (Restriction) Let (f> be an MTE potential over X = (Y,Z). As- 
sume a set of variables X' = (Y',Z') C X, whose values fixed 

(x-l^x' = x' = (y',z')). The restriction of (f) to the values (y',z') is a new 
potential defined on Ox\x' according to the following expression: 

^ ^ ( 2 ) 

for all w e Ox\x' such that x € f2x, = w and x-l^^x' = x'. In other 

words, the restriction is the potential obtained replacing every occurrence of X' 
by value x'. 

Definition 3. (Marginalization) Let (f> be an MTE potential over X = (Y,Z). 
The marginal of <f for a set of variables X' = (Y',Z') C X is the potential 
computed as 











where Z" = Z \ Z'. Observe that this function is defined on Ox' ■ 



(3) 



Example 2. Assume we want to obtain the marginal of the potential in exam- 
ple 1 for variable Z 2 - This is achieved by integrating over Z\ and simplifying 
afterwards: 



I n 0 < Z 2 < 2 

\ I + (e — l)e^^ + 5(e^ — e)e^^ 2 < Z 2 < 3 

Definition 4. (Combination) Let (f>i and ^2 be MTE potentials over Xi = 
(Yi,Zi) and X 2 = (Y 2 ,Z 2 ) respectively. The combination of (f)\ and ^2 is a 
new potential defined over X = Xi U X 2 computed as 




(f>{x) = (x-*-^^! ) • <j )2 (x-l-^^s ) for all x e Ox ■ (4) 



Example 3. In order to illustrate this operation, we will combine the potential 
in example 1 with this one: 




3 + e* 

2 + e-2* 



0 < a; < 2 
2 < a; < 4 



The result will be an MTE potential cj)" {zi , Z 2 , x) = (f){zi,Z 2 ) ■ (f>'{x) defined 
in 8 regions : 
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4>"{zi,Z2,x) 



' 6 + 2e* + + gSzi+z 2 +x 

if 0 < zi < 1,0 < Z 2 < 2,0 < X < 2 

4 -|- 10e“^* + + 5g3zi + Z2-2a: _j_ ggZi + Z2-2a; 

ifO<zi < 1,0 < Z 2 < 2,2 < X < 4 
3 + e* +3e^i+^= +eZi+z 2 +* 

*/ 0 < 2:1 < 1 , 2 < 22 < 3, 0 < a; < 2 

2 + 5e“2* + + 5e^i+z2-2x 

if 0 < zi < 1 , 2 < 22 < 3, 2 < a; < 4 

< 3 _j_ ^ _j_ 2g2zi + Z2 _|_ g2zi+Z2 + a: 

if 1 < Zi < 2,0 < Z 2 < 2,0 < X < 2 

i ^ _|_ 2g2zi+Z2 _|_ gg2zi+Z2-2a; 

if 1 < zi < 2,0 < Z 2 < 2,2 < X < 4 
I + ^ + + 5e^i+2^2+* 



*/ 1 < 2i < 2, 2 < 22 < 3, 0 < a; < 2 
1 + + 25e^i+2^=-2* 

*/ 1 < 2i < 2, 2 < 22 < 3, 2 < a; < 4 

Proposition 1 . The class of MTE potentials is closed under restriction, mar- 
ginalization and combination. 

Proof. This result can be proved relying of the fact that replacing a variable 
by one of its values, integrating over a variable, summing out a variable and 
multiplying two MTE potentials always result in a new MTE potential. Eor the 
sake of simplicity we omit the details of the proof. 



3.2 MTE distributions 

Based on MTE potentials, a probability distribution can be defined as follows. 

Definition 5. (MTE distribution) Let X = (Y,Z) be an n-dimensional mixed 
random variable. We say that X follows an MTE distribution, if its density f 
verifies that f is an MTE potential and 




A potential verifying these conditions will be called an MTE density for X. 

In the framework of Bayesian networks, the model is specified as a set of 
conditional distributions rather than joint distributions. The conditional MTE 
distribution can be defined as follows: 

Definition 6 . (Conditional MTE density) LefKi = (Yi,Zi) and~X.2 = (Y2,Z2) 
be two mixed random variables. We say that an MTE potential f defined over 
f^XiuX2 is a conditional MTE density if for each X2 G Cx2, it holds that 
ji?(X2=x2) gyj MTE density for Xi. 
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What makes the MTE distribution be worth of being studied is its versatility. 
Many models can be approximated by an MTE distribution, but also, some 
common models can be exactly represented; for instance, the uniform and the 
multinomial distributions are particular examples of this one. 

Usually, when specifying a model as a Bayesian network, we give a set of 
conditional distributions that, according to the chain rule, corresponds to a fac- 
torization of the joint distribution of the variables in the network. If the condi- 
tional distributions are of class MTE, the next proposition shows that the joint 
distribution is also of class MTE. 

Proposition 2. Let G be a Bayesian network over an n-dimensional mixed 
variable X = (Y,Z). If every eonditional distribution assoeiated with G eorre- 
sponds to a eonditional MTE density, then the joint distribution over X ean be 
represented by an MTE density. 

Proof. This is a consequence of the facts that the product of the n densities in 
the network is the joint distribution over the n-dimensional variable X and the 
product of all the densities associated with the Bayesian networks is an MTE 
potential. 

A Bayesian network representing an MTE distribution will be called an MTE 
network. 

4 Probability propagation in MTE networks 

Consider an MTE network defined for an n-dimensional variable X. Let / denote 
the joint distribution of X. If we denote by fi{xi\xpa(Xi)) the conditional MTE 
density of variable X,, i = 1, . . . , n, given its parents 'X.pa(Xi), then it holds that 
/W = Mxi\^pa(Xi))- 

Probability propagation consists in obtaining the marginal distribution on 
some variables of interest given some observations. An observation is the knowl- 
edge about the exact value X, = e, of a variable. The set of observations will be 
denoted by e, and called the evidenee set. E will be the set of variables observed. 
Every observation, X, = e,, is represented by means of an MTE potential defined 
on fixi as 5i(xi \ e,) = 1 if e, = a;,, a;, € fixt , and <5* (a;,; e,) = 0 if e, a;,. 

Using this notation, probability propagation can be defined as calculating the 
‘a posteriori’ probability function /(a;j|e), for every unobserved variable X, € 
X \ E. Notice that 



f{xi\e) /'lE(e) ’ ^ 

where /'*'®(e) denotes the marginal of the joint distribution / evaluated for ob- 
servations e^. Thus, obtaining /(a;j|e) is equivalent to compute f{xi,e) (observe 
that e is fixed) and normalize afterwards. 

^ From now on, whenever the term e is added to the argument of a function, we mean 
that if a variable in E is in the domain of the function, then that variable is fixed to 
its value in e. 
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If we call H = {fi{xi\xpa(Xi))\'i = 1, • • • ,n} U {Si{xi] ej)|ej G e}, then for any 
variable X, G X 



(xi,e) 



IXi 



n 






r(^) 



) (xi) 



K<t>eH 



(6) 



The computation of (xi,e) is usually organized in a join tree. A join tree 
is a tree where each node F is a subset of X, and such that if a variable is in 
two distinct nodes, Vi and V 2 , then it is also in every node in the path between 
Vi and V 2 - 

Every density (f) £ H is assigned to a node Vj such that dam{<ji) C Vj, in 
order to obtain an MTE potential 4>Vj defined over the set of variables Vj and 
that is equal to the product of all the densities assigned to it. 

(xi , e) can be calculated by means of a propagation algorithm in a join 
tree. Afterwards, f^^‘{xi,e) can be obtained from any node Vj containing vari- 
able Xi. One possibility is to use the Shenoy-Shafer propagation algorithm [13]. 
In the Shenoy-Shafer scheme, two mailboxes are placed on each edge of the join 
tree. Given an edge connecting nodes Vi and Vj, one mailbox is for messages 
Vj-outgoing and V,-incoming, and the other mailbox is for the reverse. The mes- 
sages allocated in both mailboxes will be MTE potentials defined on ViCiVj. 
Initially, each mailbox is empty, and once a message has been placed on one of 
them, it is said to be full. 

A node Vi in a join tree is allowed to send a message to its neighbour node 
Vj if and only if all V]-incoming mailboxes are full except the one from Vj to 
Vi. Thus, initially only nodes corresponding to leaves can send messages. The 
message V]-outgoing and T,-incoming is computed as 



4'Vi^Vj - {4'Vi ■ \ JJ 4>Vu 




iVir\Vj 



( 7 ) 



where (pVi is the initial MTE potential on Vi, 4>Vh^Vi are the messages in the 
mailboxes V* -outgoing and V]-incoming and ne{Vi) are the neighbour nodes of 
Vi. Note that one message contains the information coming from one side of the 
tree and is sent to the other side of the tree. It can be shown [13] that there is 
always at least one node allowed to send a message until all mailboxes are full, 
and when the message passing ends, for every node Vi in the join tree it holds 
that the marginal of the joint distribution / on variables Vi is 



(x, e) = (x, e) • £ f2v, , (8) 

\Vkene{Vi) ) 

which is proportional to the conditional distribution of the variables in Vi given 
observation e. f^^' and the desired conditional probability for variable X, can 
be calculated by marginalizing f^^' over this variable and normalizing the result. 
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Observe that probability propagation just requires combinations and margi- 
nalizations of MTE potentials. Thus, any new potential resulting from an oper- 
ation along the propagation is also of class MTE (see proposition 1). This fact 
is reflected in the next proposition. 

Proposition 3. The class of MTE potentials is closed under Shenoy-Shafer 
propagation. 

Proof. Initially, the potential stored in each node of the join tree is the product 
of some MTE densities. After propagation this fact still holds, since only combi- 
nations and marginalizations are performed, and these operations are closed for 
MTE potentials. Thus, the marginal densities computed in the Shenoy-Shafer 
propagation, f^^‘ are also of class MTE, and so is the conditional density f{xi\e) 
in equation (5) , since the division of an MTE potential by a real number is also 
an MTE potential. 

According to this proposition, the entire propagation in an MTE network 
operates with MTE potentials. 



Yi 




Fig. 1. A mixed probability tree representing potential (j> in example 4. 



As a data structure for representing and operating with MTE potentials, we 
propose the use of an extended version of probability trees [12], that we call 
mixed probability trees. 

Definition 7. (Mixed probability tree) We say that a tree T is a mixed prob- 
ability tree if it meets the following conditions: 

i. Every internal node represents a random variable (discrete or continuous). 

ii. Every arc outgoing from a continuous variable Z is labeled with an inter- 
val of values of Z, so that the domain of Z is the union of the intervals 
corresponding to the arcs outgoing from Z. 
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iii. Every discrete variable has one outgoing arc for each of its states. 

iv. Each leaf node contains an MTE potential defined on variables in the path 
from the root to that leaf. 

Mixed probability trees can represent MTE potentials defined by parts. Each 
entire branch in the tree determines one sub-region of the space where the po- 
tential is defined, and the function stored in the leaf of a branch is the definition 
of the potential in the corresponding sub-region. 

Example 4- Consider the following MTE potential, defined for a discrete variable 
(Yi) and two continuous variables (Zi and Z 2 ). 



4>{yi,zi,z2) 



2 -p g 3 zi + Z 2 


if 


yi -- 


= 0, 


0 


< 


Zi 


< 


1, 


0 


< 


-Z2 


< 


1 -b 


if 


yi -- 


= 0, 


0 


< 


Zi 


< 


1, 


2 


< 


-Z2 


< 


^ 4- „2zi-|-Z2 

4 


if 


yi -- 


= 0, 


1 


< 


Zi 


< 


2, 


0 


< 


-Z2 


< 




if 


yi -- 


= 0, 


1 


< 


Zi 


< 


2, 


2 


< 


-Z2 


< 


1 -b 


if 


yi -- 


= 1, 


0 


< 


Zi 


< 


1, 


0 


< 


-Z2 


< 


l-b2e^i+^= +e^i-tz2 


if 


yi -- 


= 1, 


0 


< 


Zi 


< 


1, 


2 


< 


-Z2 


< 


^ 1 „Zl-|-Z2 


if 


yi -- 


= 1, 


1 


< 


Zi 


< 


2, 


0 


< 


-Z2 


< 


Y 

2 ^ 


if 


yi -- 


= 1, 


1 


< 


Zi 


< 


2, 


2 


< 


-Z2 


< 



A possible representation of this potential by means of a mixed probability 
tree is displayed in figure 1. 



The operations of restriction, marginalization and combination over mixed 
probability trees can be carried out by means of algorithms very similar to those 
described by Kozlov and Roller [4] and Salmeron, Cano and Moral [12]. 



5 A Markov chain Monte Carlo propagation algorithm 

In section 4 we showed that exact propagation can be carried out in MTE net- 
works by means of Shenoy-Shafer algorithm, and suggested a possible represen- 
tation of MTE potentials that allows to implement the propagation algorithm. 
However, during the propagation, the size of the potentials^ involved in the cal- 
culations may grow so much that the propagation become infeasible. Instead, 
the posterior probabilities can be estimated using Markov chain Monte Carlo. 

Markov chain Monte Carlo propagation [10] proceeds by generating a sam- 
ple of the variables in the network, and then uses that sample to estimate the 
distribution of the variables of interest. The sample is generated from an initial 

^ By the size of a potential we mean the number of leaves of the mixed probability 
tree representing it. 




Mixtures of Truncated Exponentials in Hybrid Bayesian Networks 165 



configuration of the variables, where the observed variables E are instantiated to 
observation e. This initial configuration can be obtained by forward sampling. 

Once we have a starting configuration, a new one is obtained by changing 
the value of the unobserved variables one by one. The new value for a variable 
Xi is obtained by simulating from the distribution function corresponding to 
the product of its conditional distribution |xpa(p) and the distribution of 
the variables in its Markov blanket (i.e. its parents, children and parents of its 
children except Xi) restricted to the values of all the variables but X, in the 
current configuration. 

The simulation procedure above described can be applied to MTE networks. 
The only aspect to clarify is how to simulate a value for a variable with MTE 
distribution. 



5.1 Simulating from the MTE distribution 

Observe that when we are going to simulate a variable Xi we simulate from a 
distribution that depends only on Xi and that distribution is of class MTE. If 
Xi is discrete, it is straightforward to simulate a value for it: a random number 
is generated and the inverse transform method is applied [11]. 

If Xi is continuous, we may find its density defined in several pieces, and 
besides, the inverse transform method, in general, cannot be applied to mixtures 
of exponentials. However, values can be obtained applying twice the composition 
method [11]. 

Assume that the density used to simulate a variable is defined as f(x) = fi{x) 
iov ai < X < Pi, i = 1, . . . ,k, where each of the functions fi are of the form 



fi{x) = oo + oie*'* + 026 *^* + • • • + a„e*"* ai < x < Pi . (9) 

The way to simulate from f{x) by the composition method is to choose one 
of the fi with probability equal to fi(x)dx and then simulate a value inside 
interval (ai,bi) from a density f* proportional to fi'. 



fiix) 



fijx) 

la', Mx)dx 



ai <x < Pi , 



which is also a function of the form of equation (9) . 

In order to simulate from /* we have to apply again the composition method. 
The steps to apply this method are: 



1. Decompose density f*{x) as a weighted sum of densities 

fi(x) =Pif[{x) + ---+PmfLix) ( 10 ) 

with Pi = 1 • 

2. Generate two random numbers ui and U 2 - 

3. Use ui to choose one /j with probability pj, and use U 2 to obtain a value 
for variable X applying the inverse transform method to the distribution 
function corresponding to /j. 
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The decomposition in (10) must be such that the inverse of the distribution 
function corresponding to each /j can be computed. 

We can obtain such decomposition as follows. Define Cj = j = 

1, . . . ,n. Then f'Ax) = j = 1, ... ,n is a density function in 

For j = 0, Co = dx = Pi — ai and it holds that fo{x) = ^ is a density in 
(ai,Pi). With this, multiplying and dividing each term by the corresponding cj, 

f*{x) = aoco— +aici— e*'* + 0202 — H ha„c„— e*"* , ai < x < Pi , 

Co Cl C 2 c„ 

where we can take as weights pj = ajcj, j = 0, ... ,n. It can be easily verified 
that these weights sum up to one. 

Finally, the inverse of the distribution function of each /j constructed as 
described here can be easily computed. If j = 0, the distribution is just the 
uniform in (ai,Pi). If j > 0, for x € (ai,Pi), the distribution function is 

F^{x) = [ fj{t)dt = [ -e^i^dt=-^ . 

J — OQ J Oii 

To obtain the inverse, for a random number 0 < ri < 1 we take 
U = ^ ^ a; = ^ log [cjkjU + . 

Cj kj kj 

Thus, for j > 0, Fj~^{u) = ^ log {cjkjU + e*"'*) 0 < u < 1 . 

kj 

5.2 Estimating posterior probabilities from a sample 

Once the sample is obtained, it can be used to estimate values such as P{a < 
Xi < h) by counting configurations in the sample for which W, falls into (a, 6). 

Another possibility is to give an estimation of the posterior density for vari- 
able Xi. Since we know that propagation is closed for MTE distributions, the 
sample could be used to estimate the parameters of the distribution. This is not 
a trivial task, and two problems must be addressed: into how many intervals the 
domain of variable W, is to be divided and how many terms are added to the 
mixture of exponentials in each interval. 

This estimation process can be seen as a particular case of a more general 
one: learning MTE networks from data, that we will address in forthcoming 
works. 



6 Conclusions 

We have introduced the MTE distribution as a model for dealing with hybrid 
Bayesian networks. We have shown how MTE networks allow local computa- 
tion propagation algorithms, particularly the Shenoy-Shafer scheme. Any other 




Mixtures of Truncated Exponentials in Hybrid Bayesian Networks 167 



scheme that involves the restriction, combination and marginalization operations 
is also valid for MTE networks. 

Besides, we have described a Markov chain Monte Carlo algorithm for prop- 
agating in complex MTE networks. 

Much more work must be done to complete the study of the MTE models. 
Eor instance, other simulation algorithms as importance sampling [12] could be 
applied. Our current work is concerned with the design of methods for learning 
MTE networks from data and we are also investigating the extension of the MTE 
model to allow incorporating distributions specified as mixtures of Gaussians. 



References 

1. A. Cano, S. Moral, and A. Salmeron. Penniless propagation in join trees. Interna- 
tional Journal of Intelligent Systems, 15:1027-1059, 2000. 

2. F.V. Jensen, S.L. Lauritzen, and K.G. Olesen. Bayesian updating in causal prob- 
abilistic networks by local computation. Comput Stat Quarterly, 4:269-282, 1990. 

3. D. Roller, U. Lerner, and D. Anguelov. A general algorithm for approximate 
inference and its application to hybrid Bayes nets. In K.B. Laskey and H. Prade, 
editors. Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, 
pages 324-333. Morgan & Kauffman, 1999. 

4. D. Kozlov and D. Roller. Nonuniform dynamic discretization in hybrid networks. 
In D. Geiger and P.P. Shenoy, editors. Proceedings of the 13th Conference on Un- 
certainty in Artificial Intelligence, pages 302-313. Morgan & Kauffman, 1997. 

5. S.L. Lauritzen. Propagation of probabilities, means and variances in mixed graph- 
ical association models. Journal of the American Statistical Association, 87:1098- 
1108, 1992. 

6. S.L. Lauritzen and F. Jensen. Stable local computation with conditional Gaussian 
distributions. Statistics and Computing, 11:191-203, 2001. 

7. S.L. Lauritzen and D.J. Spiegelhalter. Local computations with probabilities on 
graphical structures and their application to expert systems. Journal of the Royal 
Statistical Society, Series B, 50:157-224, 1988. 

8. A.L. Madsen and F.V. Jensen. Lazy propagation: a junction tree inference algo- 
rithm based on lazy evaluation. Artif Intell, 113:203-245, 1999. 

9. K.G. Olesen. Causal probabilistic networks with both discrete and continuous 
variables. IEEE Trans on Pattern Analysis and Machine Intell, 15:275-279, 1993. 

10. J. Pearl. Evidential reasoning using stochastic simulation of causal models. Arti- 
ficial Intelligence, 32:247-257, 1987. 

11. R.Y. Rubinstein. Simulation and the Monte Carlo Method. Wiley, 1981. 

12. A. Salmeron, A. Cano, and S. Moral. Importance sampling in Bayesian networks 
using probability trees. Computational Statistics and Data Analysis, 34:387-413, 
2000 . 

13. P.P. Shenoy and G. Shafer. Axioms for probability and belief function propagation. 
In R.D. Shachter, T.S. Levitt, J.F. Lemmer, and L.N. Kanal, editors. Uncertainty 
in Artificial Intelligence 4, pages 169-198. North Holland, Amsterdam, 1990. 




Importance Sampling in Bayesian Networks 
Using Antithetic Variables 



Antonio Salmeron^ and Serafm MoraP 

^ Dpt. Statistics and Applied Mathematics 
University of Almeria 
La Canada de San Urbano s/n 
04120 Almeria, Spain 
Antonio . Salmeron@ual . es 
^ Dpt. Computer Science and Artificial Intelligence 
University of Granada 
Avda. Andalucia 38, 

18071 Granada, Spain 
smcSdecsai . ugr . es 



Abstract. In this paper we introduce an improvement over importance 
sampling propagation algorithms in Bayesian networks. The difference 
with respect to importance sampling is that during the simulation, con- 
figurations are obtained using antithetic variables (variables with nega- 
tive correlation), achieving a reduction of the variance of the estimation. 
The performance of the new algorithm is tested by means of some ex- 
periments carried out over four large real-world networks. 

Keywords: Bayesian networks, probability propagation, importance 
sampling, antithetic variables. 



1 Introduction 

Bayesian networks provide a framework for efficiently dealing with complex mul- 
tivariate models. One of the most common tasks performed over them is the 
so-called probability propagation or probabilistic inference, which consists in ob- 
taining the probability distribution of some variables of interest given that the 
value of some other variables is known. 

Exact probabilistic inference in Bayesian networks may be infeasible in large 
networks [2] , which motivates the development of approximate algorithms, most 
of them based on Monte Carlo simulation. 

Propagation algorithms based on Monte Carlo methods can be divided into 
two groups: those using Gibbs sampling [10,12] and those using importance sam- 
pling [4,6,8,9,16,17]. However, when dealing with very large networks with ex- 
treme probabilities, only the most sophisticated of them are able to provide 
accurate results, namely blocking Gibbs sampling [10] and importance sampling 
based on approximate pre-computation [8,9,16]. In both cases, the goal is to 
draw samples from a probability distribution that is difficult to manage, in the 
sense that its size is too big. 
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It is known [3] that the problem of approximating probabilities in Bayesian 
networks is NP-hard in the worst case. More precisely, many simulation algo- 
rithms fail to provide good results in large networks with extreme probabilities, 
because it can be very difficult to get a sample with positive probability. This 
fact makes necessary the study of heuristic procedures to propagate over large 
networks with extreme probabilities. 

A class of these heuristic procedures is composed by the importance sam- 
pling algorithms based on approximate pre-computation [9,16]. These methods 
perform first a fast but non exact propagation, following a node removal pro- 
cess [18]. In this way, an approximate ‘a posteriori’ distribution is obtained. In 
a second stage a sample is drawn using the approximate distribution and the 
probabilities are estimated according to the importance sampling methodology. 

In this work we will rely on the basis of the importance sampling algorithm 
based on approximate pre-computation developed in [16]. One of the particulari- 
ties of that algorithm is the use of probability trees to represent and approximate 
probabilistic potentials. 

Probability trees use the regularities of the conditional distributions to reduce 
the space necessary to store them. The use of this representation instead of 
probability tables becomes more important when we cannot afford to compute 
exact values and we have to approximate the potentials. Probability trees have 
the possibility of approximating in an asymmetrical way, concentrating more 
resources (finer discrimination) where they are more necessary: higher values 
with more variability (see [16] for a deeper discussion on these issues). 

In this paper we introduce a Monte Carlo algorithm that improves impor- 
tance sampling based on approximate pre-computation by means of the use of 
antithetic variables during the simulation process. The performance of the new 
algorithm is compared to previous importance sampling in a series of experi- 
ments carried out over some large real-world networks. 

The paper is organized as follows: in section 2 it is described how probability 
propagation can be carried out using the importance sampling technique. The use 
of antithetic variables is introduced in section 3, pointing out the modifications 
necessary to incorporate this new feature to the importance sampling algorithm. 
In section 4 the performance of the new algorithm is evaluated according to the 
results of some experiments where large networks have been used. The paper 
ends with conclusions in section 5. 

2 Importance Sampling in Bayesian Networks 

A Bayesian network is a directed acyclic graph where each node represents a 
random variable, and the topology of the graph shows the independence rela- 
tions among the variables, according to the d-separation criterion [13]. Given the 
independences attached to the graph, the joint distribution is determined giving 
a probability distribution for each node conditioned on its parents. 

Let X = {Ail, ... , X„} be the set of variables in the network, each variable 
Xi taking values on a finite set 17^. If / is a set of indices, we will write X/ 
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for the set {Xi\i G /}, and 17/ will denote the Cartesian product x^g/Ci. Given 
X G f2i and J Q I, xj is the element of f2j obtained from x by dropping the 
coordinates not in J . 

A potential / defined on 17/ is a mapping / : 17/ — >• IRq , where Mq is 
the set of non-negative real numbers. Probabilistic information will always be 
represented by means of potentials, as in [11]. The set of indices of the variables 
on which a potential / is defined will be denoted as dom(/). 

The conditional distribution of each variable Xi, i = 1, . . . ,n, given its par- 
ents in the network, Xpa^i), is denoted by a potential Pi{xi\xpa(i)) where pi is 
defined over 17p}upa(i)- 

If fV = {1,... ,n}, the joint probability distribution for the n-dimensional 
random variable X can be expressed as 

P(x) = I]^ Pi(Xi|Xpa(i)) VxGCat, (1) 

i&N 

An observation is the knowledge about the exact value Xi = of a variable. 
The set of observations will be denoted by e, and called the evidence set. E will 
be the set of indices of the variables observed. 

The goal of probability propagation is to calculate the ‘a posteriori’ proba- 
bility function p(xj,|e), x'f. G Ok, for every not observed variable Xk, k G N\E. 
This probability can be obtained from the joint distribution (1), but in general, 
that joint distribution is not available for large networks, since the number of 
values necessary to specify it grows exponentially in the number of variables in 
the network. 

Notice that the conditional probability p(xj,|e) is equal to p{x'j., e)/p(e), and, 
since p(e) = ®)> calculate the posterior probability if we 

compute the value p{x'k,e) for every x'^ G Ok and normalize afterwards. 

Let E[ = {Pi{xi\xpa,(^i-f)\i = 1,... ,n} be the set of conditional potentials. 
Then, p{x'f.,e) can be expressed as follows, 

p(x'k,e)= Y[p^{Xi\Xpa(^)) = Y n (2) 

x&On i&N x&On feH 

XE=e XE=e 

Xk=x'^ Xk=x’^ 

If observations are incorporated by restricting potentials in iJ to the observed 
values, i.e. by transforming each potential f G E[ into a potential fe defined on 
dom(/) \E as /e(x) = /(y), where ydom{f)\E = x, and yi = a, for all i G E, 
then we have. 



P{Xk,e)= Y n /e(Xdom/J = Y 
where g{x) = /e(xdom/J- 

Thus, probability propagation consists in estimating the value of the sum in 
(3), and here is where the importance sampling technique is used. 
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Importance sampling is well known as a variance reduction technique for 
estimating integrals by means of Monte Carlo methods (see, for instance, [14]), 
consisting in transforming the sum in (3) into an expected value that can be 
estimated as a sample mean. To achieve this, consider a probability function 
p* : fljq — >■ [0,1], verifying that p*(x) > 0 for every point x G Qjq such that 
( 7 (x) > 0. Then formula (3) can be written as: 



P(x'k,e) 



E 

g(x)>0 



g(x) 

p*(x) 



p*(x) = E 



' g(X*) ' 
_p*(X*)_ 



where X* is a random variable with distribution p* (from now on, p* will be 
called the sampling distribution). Then, if is a sample of size m taken 

from p * , 



P(x'k,e) 



1 g(x(^)) 

m ^ 0 *(xO) 
i=i 



( 4 ) 



is an unbiased estimator of p(x(.,e) with variance 



Var(p(4,e)) 





Minimizing the error in unbiased estimation is equivalent to minimizing the 
variance, which is achieved using a sampling distribution proportional to g(x), 
and this is the same as knowing the exact posterior distribution. Thus, in prac- 
tical situations the best we can do is to obtain a sampling distribution as close 
as possible to the optimal one. 

One characteristic of this method, as formulated above, is that it requires a 
different sample taken from a different sampling distribution for each value of 
each variable, which is very inefficient. 

Salmeron, Cano and Moral [16] showed that it is possible to use a single 
sample to estimate all posterior distributions, if the sampling distribution is 
calculated as if it were going to be used to estimate p{e) (see [16] for the details). 

Once p* is selected, p(x)., e) for each value x'f. of each variable Xk, k £ N\E 
can be estimated with the following algorithm: 



Importance Sampling 



1. For j := 1 to m (sample size) 

a) Generate a configuration x^^4 g using p* . 

b) Calculate 



(n*eivK(x?^|x5,E)) • (rije£;'5'(xp^;e0) 

p*(x(t)) 



( 5 ) 



2. For each a;). G f?k, k £ N\E, estimate p{x'f., e) as the average of the weights 
in formula (5) corresponding to configurations containing x'/.. 

3. Normalize values p{x'f.,e) in order to obtain p(x'f,\e). 
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3 Using Antithetic Variables 

The technique of importance sampling is based in a good selection of the sam- 
pling distribution p* in order to achieve variance reduction with respect to plain 
Monte Carlo (in which the uniform distribution is used). However, there are 
other possibilities of reducing the variance of the estimation; some of them are 
reported in [14,15]. One of those possibilities is the use of antithetic variables. 

A general setting for using antithetic variables is as follows: assume we want 
to estimate a parameter 9 and we have two unbiased estimators of 9, 9\ and 9^. 
Then, 



is an unbiased estimator of 9 with 

Var( 03 ) = ^Var(0i) -|- ^Var(02) + ^Cov(0i,02) • (6) 

Then a variance reduction is achieved by using 03 instead of 9i or 02 if these 
estimators have a strongly negative correlation. 

The point here is how to induce negative correlation between 0i and 02. One 
way to do this is to generate the sample in such a way that whenever a new value 
is generated from a random number U, another one is generated from 1 — U. 
Then, one of the values is used to evaluate 0i and the other for 02. We say 
that antithetic variables are used when the sample is generated from negatively 
correlated pairs of random numbers U and 1 — U. 

This technique has been used for reducing the variance in Monte Carlo inte- 
gration in Bayesian inference [7]. However, in that work antithetic variables are 
used with plain Monte Carlo instead of importance sampling, though the author 
considers promising the application of them together with importance sampling. 
We will describe here how it can be done within the context of probability prop- 
agation in Bayesian networks. 

As described above, antithetic variables can be used to estimate just a single 
probability value: 

1. Call 0 the probability value to estimate. 

2. Obtain a sampling distribution p* in order to draw samples from the set of 
configurations of the variables in the network. 

3. Generate a sample of configurations of the variables in the network where 
each two individuals in the sample are obtained from p*, one by inversion of 
a random number U and the other by inversion of 1 — C/. 

4. Estimate parameter 0 from the sample obtained, as in importance sampling. 

These four steps must be repeated for each state of each variable in the 
network, which may be very inefficient in large networks. Furthermore, usually 
it is not possible to obtain a complete configuration of the variables in the 
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network from a single random number, since it would require a joint sampling 
distribution over all the variables in the networks, whose size would be equal to 
the size of the exact joint distribution in the network. 

Because of that, importance sampling propagation algorithms based on ap- 
proximate computation do not simulate directly using the inverse of the dis- 
tribution function, but rather variables are simulated one at a time, using an 
individual sampling distribution and a different random number for each vari- 
able [16]. Also, the same sample is used to estimate the probability for each state 
of each variable in the network. 

Thus, we have applied a modified version of the technique of antithetic vari- 
ables: whenever a new variable is going to be simulated, use two numbers U and 
1 — [/. In this way, the configurations are not actually generated from antithetic 
variables, but nevertheless they are likely to be negatively correlated. Thus, it 
is reasonable to expect a variance reduction, but perhaps not as important as in 
the case of estimating the probability value of a single state of a single variable. 

The sampling distribution for each variable can be obtained through a process 
of eliminating variables in the set H of potentials. An elimination order cr is 
considered and variables are deleted according to such order: Xg.(i), . . . 

The deletion of a variable consists in combining all the functions in H 

which are defined for that variable, marginalizing afterwards in order to remove 
by adding on the different values of this variable. The potential obtained 
is inserted in H. More precisely, the steps are as follows: 

— Let = {/ G H\a{i) G dom(/)}. 

— Calculate / = O/eff f' defined on dom(/) — {a{i)}, by /'(x) = 

— Transform H into H — Ha-(i) U {/'}. 

Simulation is carried out in order contrary to the order in which variables 
are deleted. To obtain a value for A'o-(i), we will use the function / obtained in 
the deletion of this variable. This potential is defined for the values of variable 
and other variables already sampled. Potential / is restricted to the al- 
ready obtained values of variables in dom(/) — {a{i)} giving rise to a function 
which depends only of X„(^iy Finally, a value for this variable is obtained with 
probability proportional to the values of this potential. 

The result of the combinations in the process of obtaining the sampling dis- 
tributions may require a big amount of space to be stored, and therefore approxi- 
mations are employed, either using probability tables [9] or probability trees [16] 
to represent the distributions. 

The use of antithetic variables is independent of the representation used. In 
the experiments reported in this work, we will concentrate on implementations 
based on probability trees, since they provide more accurate sampling distribu- 
tions. For a detailed discussion on the use of probability trees and the process 
of obtaining the sampling distributions using that representation we refer the 
reader to [16]. 
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The propagation algorithm we propose, based on importance sampling using 
antithetic variables (denoted isav) can be formulated as follows: 

Algorithm isav 

1. Let H = {pi I 1=1,... ,n} be the set of conditional distributions in the 
network. 

2. Incorporate observations e by restricting the functions in H to e. 

3. Select an order a of variables in G, as described in [16]. 

4. For z := 1 to n, obtain a sampling distribution p* for variable Xi. 

5. For j := 1 to m/2 (to obtain a sample of size to), 

a) wxj := 1.0. 

b) wpj := 1.0. 

c) For z := n to 1, 

i. Generate a random number U. 

ii. Simulate a value for using p* as sampling distribution 

from U, and another value using 1 — U (before simulating, p* is 
restricted to the configuration of the variables previously simulated) . 

iii. Compute wXj := wXj /p*{x^p) and wyj := vopj /p*{y\^'’). 

d) Let and be the configurations obtained. 

e) Compute 



wXj := wXj 



Y[pii- 



'i I pa(i) 



) • Y[Si{xY>;ei) 



\1&E 



and 

wyj :=wyj • ('^P^(^/P VJ,a(*))) ' 

\i=l / \l€E 

6. For each x). G 17^, k £ N\E, estimate p(xj.,e) as the average of the weights 
wXj and wyj, j = 1 ,... ,mf2, corresponding to configurations containing 

4- 

7. Normalize values p(x).,e) to obtain p(xj.|e). 

4 Experimental Evaluation of the New Algorithm 

The performance of the new algorithm has been evaluated by means of several 
experiments carried out over four large real-world Bayesian networks. The four 
networks are called pedigree (441 variables), muninl (189 variables), munin2 (1003 
variables) and water (32 variables). 

The networks have been borrowed from the Decision Support Systems group 
at Aalborg University (Denmark) (www.es. auc.dk/research/DSS/misc.html). 

The performance of importance sampling using antithetic variables (isav) 
has been compared with importance sampling without this feature (is), using 
the same implementation as in [16]. 
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The new algorithm has been implemented in Java, and included in the Elvira 
shell (leo.ugr.es/~elvira). 

The four networks chosen for the experiments are difficult for simulation 
algorithms to be applied, since there are extreme probabilities, and obtaining 
a configuration from them may become hard. Extreme cases are the likelihood 
weighting method [17] and the bounded variance method [4]: these methods do 
not even get any result, since all the configurations in the sample get a zero 
weight . 

We have carried out two experiments with each network (with and without 
observations), with different sample sizes (5000, 7500, 10000, 12500 and 15000). 
In all of the experiments the maximum potential size has been set to 1000 values. 
The only exception was made with network water for which a second experiment 
was carried out with a maximum potential size of 2500 values. The motivation 
of this second experiment is explained later. The threshold for pruning the prob- 
ability trees has been set to e = 0.01 (see [16]). This value of e indicates that 
values in the leaves of the tree whose difference with respect to a uniform distri- 
bution is less than a 1% are replaced by their average. Replacing values by the 
average of them may cause that some configurations whose probability is equal 
to zero have a positive probability after pruning. It means that it is possible to 
obtain a configuration in the sample whose exact probability is equal to zero, 
which implies that the configuration has to be discarded. Increasing the number 
of values used to represent a probabilistic potential (the potential size), the risk 
of discarding configurations decreases. 

Each trial has been repeated 100 times to average the results. In each trial, 
we have calculated the computing time and error. 

For a single variable Xi, the error is measured as follows (see [5]): 



G{Xi) 



\ 



\Gi\ 



E 

a^f^l 



{p{a\e) - p{a\e)y 
p{a\e){l - p{a\e)) 



( 7 ) 



where p{a\e) is the true posterior probability, p(a\e) is the estimated value and 
\fli\ is the number of states of variable Xi. For a set of variables X/, the error 
is: 



G(X,)= /^G(W)2 . (8) 

y ie/ 

This measure of error seems appropriate when we have extreme probabilities, 
since errors when estimating very low probability values are penalized. 

The experiments have been carried out in an AMD K7 800MHz computer, 
with 512MB of RAM and operating system Linux 2.2.16. The Java virtual ma- 
chine used was Java 2 version 1.3. 

The results of the experiments are reported in figures 1 to 5 where error is 
represented versus computing time. 
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Fig. 1. Error vs. time for muninl network without observed variables (left) and with 
observed variables (right) 




Fig. 2. Error vs. time for munin2 network without observed variables (left) and with 
observed variables (right) 



4.1 Results Discussion 

The experiments have shown a very good performance of algorithm isav, speed- 
ing up the convergence to the exact results in problems with observations and 
without observations. It must be pointed out that algorithm is also provides 
good approximations in all of the experiments, but adding the feature of us- 
ing antithetic variables improves the behaviour of the algorithm. Part of the 
improvement is due to implementation aspects: generating two configurations 
simulateously is more efficient that simulating one after another. 

However, there is an experiment in which the accuracy of isav decreases with 
respect to is. This is due to the difficulty of propagating in network water with 
the observations inserted. It happens that many configurations are not consistent 
with the evidence and then they have to be discarded, as we explained above. 
It has a double impact in the case of antithetic variables: in the process of 
simulating a pair of configurations, if one of them is found to be inconsistent with 
the evidence, then both configurations are discarded and the algorithm starts 
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Fig. 3. Error vs. time for pedigree network without observed variables (left) and with 
observed variables (right) 




Fig. 4. Error vs. time for water network without observed variables (left) and with 
observed variables (right), taking a maximum of 1000 values for each potential 



to look for a new pair of configurations, even if still the other configuration was 
consistent with the evidence. 

One way of avoiding this problem is not to discard the two configurations, 
but continue with the valid one until it is completed. 

We have not used this alternative in the experiments because we wanted 
to evaluate the impact of crude application of antithetic variables, in which 
the configurations should be always grouped in pairs of negatively correlated 
configurations. 

Instead of it we have increased the maximum potential size up to 2500 values 
in order to reduce the amount of discarded configurations, obtaining the results 
shown in figure 5. 
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Fig. 5. Error vs. time for water network without observed variables (left) and with 
observed variables (right), taking a maximum of 2500 values for each potential 



5 Conclusions 

We have introduced a modification over importance sampling algorithms for 
probabilistic propagation in Bayesian networks, based on the use of antithetic 
variables. 

The use of antithetic variables has been experimentally tested over four real- 
world large networks, showing an important increase of the performance of the 
algorithm: more accurate approximations are achieved in a lower time. 

The use of variance reduction techniques as importance sampling [9,16], strat- 
ified sampling [1,9] and now antithetic variables, seems to be a good way of im- 
proving accuracy of Monte Carlo propagation algorithms for Bayesian networks. 

Thus, we are planning to continue with the application of other variance 
reduction techniques (common variables, for instance) to obtain better propaga- 
tion algorithms. 
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Abstract. Darwiche has recently proposed a graphical model for driv- 
ing conditioning algorithms, called a dtree, which specifies a recursive 
decomposition of a directed acyclic graph (DAG) into its families. A 
main property of a dtree is its width, and it was shown previously how 
to convert a DAG elimination order of width w into a dtree of width < w. 
The importance of this conversion is that any algorithm for construct- 
ing low-width elimination orders can be directly used for constructing 
low-width dtrees. We propose in this paper a more direct method for 
constructing dtrees based on hypergraph partitioning. This new method 
turns out to be quite competitive with existing methods in minimizing 
width. We also present methods for converting a dtree of width w into 
elimination orders and jointrees of no greater width. This leads to a new 
class of algorithms for generating elimination orders and jointrees (via 
recursive decomposition) . 



1 Introduction 

Darwiche has recently proposed a graphical model, called a dtree, which specifies 
a recursive decomposition of a directed acyclic graph (DAG) into its families. The 
main application of dtrees is in driving a class of divide-and-conquer algorithms, 
called recursive conditioning, which can be used for anyspace probabilistic and 
logical reasoning [5,3,4]. Formally, a dtree is a full binary tree with its leaves 
corresponding to the DAG families (nodes and their parents). Figure 1(a) depicts 
a DAG and two corresponding dtrees. 

The quality of a dtree is measured by a number of parameters. The main 
property of a dtree is its width. For example, if we have a belief network with n 
variables, and if we can construct a dtree of width w for the network, then we can 
answer probabilistic queries in 0(nexp(w)) space and time. A dtree has other 
important properties though. For example, if the height of a dtree is h, then we 
can reason about the network in 0(n) space and 0{nexp{hw)) time. Therefore, 
constructing dtrees with minimal width and height is quite important. 

Existing methods for constructing dtrees for a DAG focus on initially con- 
structing a good elimination order for the DAG. It was previously shown how 
to convert an elimination order of width w for DAG G into a dtree of width 

S. Benferhat and P. Besnard (Eds.): ECSQARU 2001, LNAI 2143, pp. 180-191, 2001. 
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< w for the same DAG [5], implying that any algorithm for constructing low- 
width elimination orders is immediately an algorithm for constructing low-width 
dtrees. It was also shown that any dtree can be balanced in 0(n log n) time, giv- 
ing it O(logn) height, while only increasing its width by a constant factor [5]. 
Therefore, to construct a dtree for linear-space reasoning, one can compute an 
elimination order of small width, convert it to a dtree of no greater width, and 
then balance the dtree to minimize its height. 

We report in this paper on a new method for constructing balanced dtrees. 
The method is based on hypergraph partitioning, a well-studied problem with 
applications to many areas, including VLSI design, efficient storage of databases 
on disk, and data mining [II] — the goal here is to partition a hypergraph into 
equally-sized parts, while minimizing the hyperedges which cross from one part 
to another. Specifically, we show how the process of constructing balanced dtrees 
for a DAG can be reduced to the process of recursively partitioning a hypergraph 
based on the DAG. 

Although the proposed method does not directly attempt to minimize the 
dtree width, our experimental results show that from a width standpoint, it gen- 
erates dtrees that are competitive with those produced from elimination orders 
based on the min- fill heuristic. Furthermore, the generated dtrees are superior 
when considering other properties such as height. 

A key point is that our algorithm for constructing dtrees has a much broader 
applicability, since any algorithm for producing low-width dtrees is immediately 
a good algorithm for producing low-width jointrees and elimination orders. It 
was shown previously that any dtree for a DAG can be immediately converted 
into a jointree for that DAG [5]. Therefore, our new method for constructing 
dtrees is immediately a method for constructing jointrees with similar properties, 
including width. We also show in this paper that each dtree of width w naturally 
determines a partial elimination order. Moreover, each (total) elimination order 
which is consistent with this partial order is guaranteed to have a width no 
greater than w. The implication of these results is that any method for recursively 
decomposing a DAG into a dtree can be used to produce elimination orders and 
jointrees for that DAG, with interesting guarantees on their qualities. 

This paper is structured as follows. We start in Section 2 by reviewing dtrees 
and their applications. We then introduce the problem of hypergraph partition- 
ing in Section 3, where we show how it can be used to obtain balanced dtrees. 
We then show in Section 4 how to convert dtrees of a certain width into elimina- 
tion orders and jointrees of no greater width. We next present our experimental 
results in Section 5 and finally close with some concluding remarks in Section 6. 

2 Dtrees 

A dtree (decomposition tree) is a full binary tree which induces a recursive 
decomposition on a directed acyclic graph. A dtree is used to drive divide-and- 
conquer algorithms, such as the algorithm of recursive conditioning for inference 
in Bayesian networks [5]. The following is the formal definition of a dtree. 
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Fig. 1. (a) A DAG and two corresponding dtrees. (b) A dtree and its cutsets (in italic). 




Fig. 2. (a) A dtree with its contexts (in italic), (b) A dtree with its clusters (in italic). 
The clusters of leaves are the families associated with these leaves . 



Definition 1. A dtree T for a DAG G is a full binary tree, the leaves of which 
correspond to the families ofGf If t is a leaf node in dtree T which corresponds 
to family F of DAG G, we define vars(t) F. 

Figure 1(a) depicts two dtrees for the DAG shown in the same figure. Examine 
the first dtree. The top level specifies a partition of the DAG families into two 
sets: {A,AB,AC} and {BCD,CE}. The left subtree specifies a partition of 
families {A, AB, AC}, while the right subtree specifies a partition of families 
{BCD,CE} (unique in this case). 

We will use ti and A to denote the left child and right child of node t in a dtree. 
Following standard conventions on binary trees, we will often not distinguish 
between a node and the dtree rooted at that node. We will next define a few 
more variable sets for each node in a dtree and then discuss their applications. 

Definition 2. [5] For an internal node t in a dtree: 

d&f 

— The variables oft are defined as vars(t) = vars(b) U vars(G). 

— The cutset oft is defined as cutset(t) = vars(ti)nvars(G) — acutset(t), where 
acutset(t) is the union of cutsets associated with ancestors oft. 

Recall that the family of node v in DAG G consists of v and its parents in G. 



1 
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Moreover, for node t in a dtree: 

— The context oft is defined as context(t) = vars(i:) n acutset(t). 

— The cluster of t is defined as 



cluster(t) = 



vars(i), ift is leaf; 

cutset(i) Ucontext(t), otherwise. 



The width of a dtree is defined as the size of its largest cluster minus 1. 



Figure 1(b) shows a dtree and its corresponding cutsets. Figure 2(a) shows the 
dtree contexts and Figure 2(b) shows its clusters. 

The cutsets of a dtree are used by conditioning algorithms to recursively 
decompose a DAG-based model (such as a belief network) into smaller models 
that can be solved independently. The contexts are used to cache results ob- 
tained with respect to the smaller models, which reduces the running time of 
conditioning algorithms but at the expense of using more space. ^ The clusters 
are used to provide guarantees on the computational properties of conditioning 
algorithms based on dtrees. 

The ways in which cutsets and contexts are used by conditioning algorithms 
are outside the scope of this paper, but we refer the reader to [5, 3, 4] for details. 
Here, we only focus on the significance of these sets from a complexity viewpoint. 
Specifically, suppose that we have a belief network with DAG G that contains 
n variables, and let T be a dtree for G. Let Wc be the size of the largest cutset 
in T (called the cutset width of T), be the size of largest context in T (called 
the context width of T), w be the width of T, and h be the height of T. We 
can then use the algorithm of recursive conditioning given in [5] to compute the 
probability of some instantiation e according to the following complexity: 

— 0{nexp{w)) time and 0{nexp{wx)) space; or 

— 0{nexp{hwc)) time and 0(n) space. 

The above complexity results represent two extremes on a time-space tradeoff 
spectrum. In general, we can use any amount of space we have available, and 
still be able to predict the average running time of recursive conditioning [5]. 
Moreover, we can always balance a dtree so that its height becomes O(logn) 
while only increasing the sizes of its cutsets, contexts and clusters by a constant 
factor.^ Balanced dtrees are especially important if one wants to reason with 
belief networks under linear space as shown above. 

The main existing method for constructing dtrees is to convert an elimina- 
tion order of width w into a dtree of width < w [5]. We discuss in Section 3 a 
different class of algorithms for constructing (balanced) dtrees based on hyper- 
graph partitioning. This class of algorithms attempts to minimize the height and 
cutset width of dtrees. Yet, we shall present experimental results in Section 5 
showing that it produces very competitive dtrees from the standpoint of width 

^ This is done using the technique of memoization from dynamic programming. 

® The number of dtree nodes is always twice (minus one) the number of DAG nodes. 
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and context width, at least when compared with dtrees constructed based on 
elimination orders. Given that one can easily convert a dtree of width w into an 
elimination order or jointree of width < w, the proposed method has implica- 
tions on the construction of elimination orders and jointrees. This is discussed 
in Section 4. 



3 Dtree Construction as Hypergraph Partitioning 

Previous methods for constructing low-width dtrees have focused on using ex- 
isting heuristics to generate low-width elimination orders, then converting these 
elimination orders to dtrees [5] . An alternative approach is to generate the dtrees 
directly. The technique we now present uses hypergraph partitioning as a tool 
for directly generating low- width dtrees. 

A hypergraph is a generalization of a graph, such that an edge is permitted to 
connect an arbitrary number of vertices, rather than exactly two. The edges of a 
hypergraph are referred to as hyperedges. The problem of hypergraph partitioning 
is to find a way to split the vertices of a hypergraph into k approximately equal 
parts, such that the number of hyperedges connecting vertices in different parts 
is minimized [11]. 

The problem of hypergraph partitioning is well-studied, as it applies to many 
fields, including VLSI design, efficient storage of databases on disk, and data 
mining [11]. Since solving the problem optimally is at least NP-hard [9], much 
energy has been devoted to developing approximation algorithms for hypergraph 
partitioning. A paper by Alpert and Khang [1] surveys a variety of the approaches 
taken to this problem. 

For our purposes, we used hMeTiS, a hypergraph partitioning package dis- 
tributed by the University of Minnesota [12]. Loosely speaking, hMeTiS col- 
lapses vertices and hyperedges of the original hypergraph to produce a smaller, 
aggregated hypergraph, then uses various specialized algorithms to partition the 
smaller hypergraph. After doing this, it uses specialized algorithms to construct a 
partition for the original, refined hypergraph using the partition for the smaller, 
aggregated hypergraph. Experimental results have shown that the partitions pro- 
duced by hMeTiS are consistently better than those produced by other popular 
algorithms [12]. In addition, hMeTiS is between one and two orders of magnitude 
faster than other algorithms [12]. One of the useful features of hMeTiS is that 
the user can specify how balanced the partition will be. Concretely, the user can 
specify that each part must contain no less than X% of the vertices. 

Generating a dtree for a DAG using hypergraph partitioning is fairly straight- 
forward. The first step is to express the DAG G as a hypergraph H: 

— For each family F in DAG G, we add a node Np to FI. 

— For each variable V in DAG G, we add a hyperedge to which connects all 

nodes Np such that V G F. 



An example of this is depicted in Figure 3(a). 
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(b) 



Fig. 3. (a) From a DAG to a hypergraph, (b) An example bipartitioning of the hyper- 
graph into two subgraphs. 



Notice that any full binary tree whose leaves correspond to the vertices of H 
is a dtree for our DAG. This observation allows us to design a simple recursive 
algorithm using hypergraph partitioning to produce a dtree. Figure 4 shows the 
pseudocode for this algorithm. hgr2bdt starts by creating a dtree node t at 
Line 01. Lines 02-05 correspond to the base case where hypergraph H contains a 
single vertex Np (corresponding to family F) and, hence, leads to a unique dtree 
which contains the single leaf node t with vars(t)^F. Lines 06-08 correspond to 
the recursive step where hypergraph H has more than a single vertex. Here, 
we partition the hypergraph H into two subgraphs Hi and H^, then recursively 
generate dtrees HGR2BDT(iJ;) and HGR2BDT(iLr) for these subgraphs, and finally 
set these dtrees as the children of dtree node t. 

hgr2bdt attempts to minimize the cutset of each node t it constructs at 
Line 01. To see this, observe that every time we partition the hypergraph H into 
Hi and Hr, we attempt to minimize the number of hyperedges that span the 
partitions Hi and Hr- By construction, these hyperedges correspond to DAG 
variables that are shared by families in Hi and those in Hr (which have not 
already been cut by previous partitions). Hence by attempting to minimize the 
number of hyperedges that span the partitions Hi and Hr, we are actually at- 
tempting to minimize the cutset associated with dtree node t. Notice that we 
do not make any direct attempt to minimize the width of the dtree. However, 
we shall see in Section 5 that cutset minimization is a good heuristic for dtree 
width minimization. 

An advantage to this approach is that it also produces balanced dtrees, in the 
sense that for any node in the dtree, the ratio of the number of leaves in its left 
subtree to the number of leaves in its right subtree is bounded. This is a direct 
consequence of the fact that hMeTiS computes balanced hypergraph partitions. 
Thus the algorithm computes dtrees that have height of 0(log n), where n is 
the number of nodes in the given DAG. 

One can attach “weights” to edges in a hypergraph and then instruct hyper- 
graph partitioning algorithms to minimize the sum of such weights in a hyper- 
graph cut. This is important when building dtrees for DAGs where variables have 
different cardinalities. Specifically, suppose we have a variable V with N values 
in a DAG G. When defining the hypergraph for G, we can define the weight of 
the hyperedge representing variable V as log{N). For each cut, the hypergraph 
partitioning algorithm will thus try to minimize the sum of the weights of the 
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Algorithm hgr2bdt 

HGR2BDx(hypergraph H) 

01. create dtreenode t 

02. if H has only one vertex Nf, 

03. then vars(t) ^ F 

04. ti ^ NULL 

05. tr ^ NULL 

06. else partition H into two subgraphs Hi and Hr 

07. ti F- HGR2BDT(Hi) 

08. tr ^ HGR2BDT(J7r.) 

09. return t 



Fig. 4. Pseudocode for producing dtrees using hypergraph partitioning. 




Fig. 5. Converting a dtree into an elimination order. The variables in bold/italic are 
eliminated at the corresponding dtree nodes. 



cut edges, which translates to minimizing the product of variable cardinalities 
in the cut. 



4 Prom Dtrees to Elimination Orders and Jointrees 

In this section, we discuss width-preserving transformations from dtrees to elim- 
ination orders, and from dtrees to jointrees. The implication of such transforma- 
tions is that any algorithm for constructing low-width dtrees is immediately an 
algorithm for constructing low-width elimination orders and jointrees. We will 
begin our discussion by reviewing the concept of an elimination order. 

An elimination order of an undirected graph G is an ordering (1), (2), . . . 
of the nodes in G. One of the simplest ways for defining the width w of order 
is constructively. Simply eliminate nodes (1), (2), . . . , (n) from G in that 
order, connecting all neighbors of a node before eliminating it. The maximum 
number of neighbors that any eliminated node has is then the width of order 
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The width of an elimination order with respect to a DAG G is defined as 
its width with respect to the moral graph of G-that is, the graph which results 
from connecting all parents of each node, and then dropping the directionality 
of edges. 

Elimination orders are the basis of an important class of algorithms, known 
as variable elimination algorithms [7, 15]. They are also the basis for construct- 
ing jointrees [10, 14]. In both cases, the complexity of algorithms is exponential 
only in the width of given elimination order w. Hence, generating low-width 
elimination orders is critical for the efficiency of these algorithms. 

An algorithm is presented in [5] for converting an elimination order of width 
w into a dtree of width < w. The method allows one to capitalize on algorithms 
for constructing low-width elimination orders in order to construct low-width 
dtrees. Here, we present a result which allows us to do the opposite. Specifically, 
we show how a dtree of width w can be used to induce elimination orders of 
width < w. In fact, we show that each dtree specifies a partial elimination order, 
and any total order consistent with it is guaranteed to have no greater width. 

Definition 3. Let T he a dtree for DAG G. We say that node v ofGis eliminated 
at node t of T precisely when v G cluster(f) — context(t). 

Note that for an internal node t, cluster(t) — context(t) is precisely cutset(t) [5]. 
Figure 5 depicts a dtree and the DAG nodes eliminated at each of its nodes. 

It is actually not hard to prove that every DAG node is eliminated at some 
unique dtree node [6] . This allows us to define a partial elimination order, where 
for each DAG nodes v and u, we have < m if the dtree node at which v is 
eliminated is a descendant of the dtree node at which u is eliminated. 

In the dtree of Figure 5, we have C < E < A < {D,F}. We also have 
H < E, B < A and G < {D,E}. Any total elimination order consistent with 
these constraints is guaranteed to have no greater width than that of the dtree. 

Theorem 1. [6] Let T be a dtree of width w for DAG G and let be a total 
elimination order for G which is consistent with the partial elimination order 
defined by T. The width of is then < w. 

The following two orders are consistent with the dtree in Figure 5: 
< G, H, E, B, A, G^D^F > and < H, G, B, E, G, A, F, D >. Each of these elim- 
ination orders has width 2. It is easy to generate an elimination order which is 
consistent with a given dtree through a post-order traversal of the dtree. 

Therefore, if we have an algorithm for constructing low-width dtrees, then we 
immediately have an algorithm for constructing low- width elimination orders. 

A similar result exists for converting a dtree of width w into a jointree of the 
same width [5]. We review the result here as it allows us to put the experimental 

^ If DAG variables have different cardinalities, we can also define the weighted width 
of an elimination order tt. Specifically, for a set of variables S = Vi, . . . , Vm with 
cardinalities Ni , . . . , Nm, the weight of S is defined as logj 

weight of the set that contains X and its neighbors. The weighted width of an order 
is then defined as the maximum wx that any eliminated node X has. 




188 A. Darwiche and M. Hopkins 



results of Section 5 in broader perspective. We start with the formal definition 
of a jointree. 

A jointree for DAG G is a pair (T, G), where T is a tree and G labels each 
node in T with a subset of nodes in G such that 

1. Each family of DAG G is contained in some label G(n). 

2. For every three nodes v, u and w in T, if w is on the path connecting v and 
u, then G(n) n G(u) C C{w). 

Each label C{v) is called a cluster, and the width of a jointree is defined as the 
size of its largest cluster minus one. Another important aspect of a jointree is 
its separators: for each edges (u,v) in the jointree, one defines the separator as 
G(u) n G(n). The running time of algorithms based on jointrees is exponential 
in the width. Their space complexity, however, can be only exponential in the 
size of the separators. 

It is shown in [5] that if T is a dtree for a DAG G, and if G is a function that 
maps each node in dtree T to its cluster (as defined in Definition 2), then (T, G) 
is a jointree for DAG G (see Figure 2(b)). Moreover, the context of a node t in 
T is the separator on the edge connecting t to its parent in T (see Figure 2(a)). 
This means that one can easily convert a dtree into a jointree of the same width. 
It also means that if the dtree have small contexts, then the jointree will have 
small separators. Finally, if the dtree is balanced, then the jointree it induces 
will be also balanced in the following sense. We can choose a jointree node (call 
it the root) so that the distance from the root to any jointree leaf is O(logn), 
where n is the number of DAG nodes. 



5 Experimental Results 

We compare experimentally in this section two methods for constructing dtrees: 
one based on elimination orders and another based on hypergraph partitioning. 
The first method generates unbalanced dtrees, while the second generates bal- 
anced ones. As long as the two methods are comparable with regards to the width 
of dtrees they generate, we will prefer balanced dtrees. There are many heuristics 
for generating low- width elimination orders [13], but it is well accepted that the 
min- fill heuristic is among the best. This is the one we use in our experiments. 

To build dtrees for our set of benchmark suites with hgr2bdt, we imple- 
mented hgr2bdt in G-l— I- using the Standard Template Library, as well as the 
hMeTiS hypergraph partitioning package from the University of Minnesota [12]. 
Recall that hMeTiS allows the user to specify how balanced each partition will 
be. We varied this parameter such that hMeTiS could produce bipartitions of 
maximum ratio 51-49, 60-40, 70-30, 80-20, and 90-10. For example, for ratio 60- 
40, the larger part of the bipartition could be comprised of at most 60% of the 
vertices of the original hypergraph. Since hMeTiS is also nondeterministic, we 
ran 5 trials at each balance setting, and then took the best dtree (in terms of 
width) from the 25 total trials. This is the dtree that we report in our results. 
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Table 1. Statistics for ISCAS’85 Benchmark Circuits. 



Circuit 


hgr2bdt 


1 Min-fill 1 


Unbalanced 


Balanced 


Width 


Context 

Width 


Height 


Width 


Context 

Width 


Height 


Width 


Context 

Width 


Height 


c432 


27 


23 


11 


27 


23 


16 


27 


23 


12 


c499 


22 


19 


13 


24 


25 


47 


31 


25 


13 


c880 


23 


22 


13 


25 


24 


42 


29 


25 


16 


C1355 


22 


19 


24 


24 


25 


49 


31 


25 


16 


C1908 


44 


32 


13 


50 


43 


23 


51 


46 


16 


C2670 


33 


29 


22 


37 


32 


39 


37 


29 


19 


C3540 


74 


61 


15 


97 


81 


73 


97 


81 


19 


C5315 


52 


49 


16 


45 


44 


79 


53 


51 


19 


C6288 


46 


38 


35 


53 


43 


48 


53 


43 


19 


C7552 


42 


35 


17 


48 


37 


41 


51 


42 


21 



Table 2. Results for Suite of Belief Networks. 



Network 


hgr2bdt 




Min 


-fill 




Weighted 


Unbalanced 

Min-fill 


Unbalanced 


Balanced I 


hgr2bdt 




Width 


Height 


Width 


Height 


Width 


Height 


Weighted Width 


Weighted Width 


barley 


7 


10 


7 


19 


8 


9 


25.41 


23.37 


diabetes 


7 


11 


4 


53 


9 


14 


24.70 


17.23 


link 


16 


17 


15 


33 


19 


17 


27.00 


24.00 


mildew 


5 


7 


4 


13 


7 


8 


24.50 


20.74 


muninl 


11 


10 


11 


31 


12 


12 


25.19 


28.03 


munin2 


9 


13 


7 


47 


9 


17 


22.16 


18.10 


munin3 


8 


16 


7 


35 


10 


17 


21.84 


17.25 


munin4 


9 


13 


8 


37 


10 


18 


23.75 


21.38 


pigs 


11 


11 


10 


38 


14 


14 


19.02 


17.43 


water 


10 


7 


10 


12 


10 


9 


20.34 


20.75 



Our first suite of DAGs is obtained from the ISCAS’85 benchmark circuits 
[2]. These circuits have been studied by El Fattah and Dechter in [8], wherein 
elimination orders were generated using several well-known heuristics. We found 
that min-fill produced better orders than any of the heuristics surveyed in [8]. 
Hence we used min-fill to construct elimination orders for these circuits, then 
constructed dtrees based on these orders in the manner described by [5] . We also 
constructed dtrees using hgr2bdt. The results are reported in Table 1. A third 
class of dtrees is also reported, which results from balancing the first class using 
a technique described in [5] . 

A number of observations are in order here. First, if all we care about is gen- 
erating low-width elimination orders, then constructing a dtree using hgr2bdt 
and extracting an elimination order from it is almost always (much) better than 
using the min-fill heuristic. A particularly dramatic example of this is c3540, for 
which hgr2bdt was able to produce an elimination order of width 74. By con- 
trast, min-fill produced an elimination order of width 97, while the best heuristic 
surveyed by El Fattah and Dechter in [8] produced an elimination order of width 
114. Interestingly enough, these dtrees not only lead to better elimination orders, 
but are also balanced and tend to have smaller contexts. Therefore, hgr2bdt 
appears to be favorable for constructing dtrees and jointrees as well, for which 
other properties (beyond width) are of interest. 

Our second class of DAGs is obtained from belief networks posted at http:/ / 
www.cs.huji.ac.il/labs/compbio/Repository/; see Table 2. For these networks, 
the min-fill heuristic (without balancing) did better overall than either of the 
two methods that generate balanced dtrees. So we are paying a price here for 
balance, although it does not seem to be too high. The highest price appears 
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Table 3. Results for Randomly Generated DAGs. 



Number 
of nodes 


Edge 

prob. 


Version 


hgr2bdt 


1 Min-fill 1 


Unbalanced 


Balanced 


Width 


Context 

Width 


Height 


Width 


Context 

Width 


Height 


Width 


Context 

Width 


Height 


200 


.015 


1 


22 


20 


10 


22 


21 


37 


22 


21 


13 






2 


34 


28 


10 


32 


31 


42 


33 


31 


13 






3 


28 


23 


11 


28 


26 


35 


31 


27 


13 






4 


29 


25 


10 


29 


28 


42 


30 


28 


13 






5 


29 


26 


10 


27 


26 


38 


29 


26 


13 


300 


.008 


1 


29 


25 


11 


31 


30 


47 


31 


30 


14 






2 


24 


21 


11 


24 


23 


37 


25 


23 


13 






3 


33 


29 


11 


33 


33 


50 


35 


32 


14 






4 


29 


25 


11 


31 


30 


40 


31 


30 


15 






5 


30 


27 


11 


31 


30 


49 


32 


31 


14 


400 


.005 


1 


21 


19 


11 


21 


20 


50 


23 


20 


14 






2 


20 


18 


11 


20 


19 


45 


21 


20 


15 






3 


17 


16 


11 


15 


15 


42 


18 


19 


15 






4 


18 


16 


11 


18 


19 


42 


21 


19 


15 






5 


22 


19 


11 


24 


23 


44 


24 


23 


15 


500 


.004 


1 


16 


15 


11 


16 


14 


39 


16 


14 


15 






2 


21 


20 


14 


22 


21 


44 


24 


22 


15 






3 


23 


21 


12 


24 


22 


51 


24 


22 


14 






4 


26 


23 


11 


25 


23 


48 


28 


22 


15 






5 


23 


22 


11 


23 


22 


32 


23 


22 


16 



to be for network diabetes, which has 413 nodes and whose dtree height went 
from 53 to 11 as a result of balancing. What is clear though is that generating 
balanced dtrees using hgr2bdt appears to be superior to generating dtrees using 
an elimination order and then balancing them. 

This second suite of DAGs is our only testing suite with variables of differ- 
ing cardinalities. Hence, we also ran the weighted version of our heuristic (as 
described at the end of Section 3) on this suite and compared the resulting elim- 
ination orders with min-fill. Again, min-fill generally does better on this suite 
with regards to this new evaluation criterion. 

Our third suite of DAGs is generated randomly according to the given prob- 
abilities of edges; see Table 3. For this suite, the use of hgr2bdt for generating 
dtrees, jointrees and elimination orders seems to produce the best results overall, 
considering width, context width and height. 

It is worth noting that the execution time of hgr2bdt is reasonable. For 
the largest network in our testing set, c7552 (a network with 7230 vertices), 
hgr2bdt takes approximately 5 minutes to produce a dtree on a Pentium II 
266. For most of the smaller networks, the execution time of hgr2bdt is only a 
matter of seconds. 

6 Conclusion 

This paper rests on two contributions, one theoretical and another practical. 
Theoretically, we have shown how methods for recursively decomposing DAGs 
can be used to construct elimination orders, dtrees and jointrees. Practically, we 
have proposed and evaluated the use of a state-of-the-art system for hypergraph 
partitioning to recursively decompose DAGs and, hence, to construct elimination 
orders, dtrees and jointrees. The new method appears to be different from cur- 
rent tradition in automated reasoning, where elimination orders are the basis of 
constructing various graphical models. There are many heuristics for generating 
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low-width elimination orders, and it is customary for automated reasoning sys- 
tems to give the user a choice of which one to use since even a small reduction 
in width can be have a drastic practical effect. Our experimental results sug- 
gest that the construction of graphical models based on hypergraph partitioning 
should clearly be considered as one of these choices, whether one is interested in 
elimination orders, jointrees, or dtrees. 
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Abstract. In this paper^ we examine the ability to perform causal reasoning 
with recursive equilibrium models. We identify a critical postulate, which we 
term the Manipulation Postulate, that is required in order to perform causal 
inference, and we prove that there exists a general class T of recursive equi- 
librium models that violate the Manipulation Postulate. We relate this class 
to the existing phenomenon of reversibility and show that all models In T dis- 
play reversible behavior, thereby providing an explanation for reversibility and 
suggesting that it is a special case of a more general and perhaps widespread 
problem. We also show that all models in T possess a set of variables V' whose 
manipulation will cause an instability such that no equilibrium model will ex- 
ist for the system. We define the Structural Stability Principle which provides 
a graphical criterion for stability in causal models. Our theorems suggest that 
drastically incorrect inferences may be obtained when applying the Manip- 
ulation Postulate to equilibrium models, a result which has implications for 
current work on causal modeling, especially causal discovery from data. 



1 Introduction 

Manipulation in causal models originated in the early econometrics literature [9, 12] 
in the context of structural equation models, and has recently been studied in arti- 
ficial intelligence, building a sound theory from some basic axioms and assumptions 
regarding the nature of causality [10, 7]; work which has resulted in the development 
of the Manipulation Theorem [10] and in sound and complete axiomatizations for 
causality [3] , including the development of a new language for causal reasoning [4] . 

Critical to these formalisms is the assumption that when some variable in the 
model is manipulated, the net result from a structural standpoint will be the removal 
of ares coming into that variable. In this paper we label this fundamental assumption 
the Manipulation Postulate. The Manipulation Postulate, which will be formally de- 
fined in Section 2, is based on our conception of what a “causal model” is together 
with our conception of what it means to “manipulate” a variable. As intuitive as 
this idea is, there are a few simple physical examples that have been suggested [10, 1] 
which seem to violate the Manipulation Postulate; in particular, systems have been 
identified which appear to be reversible. Neither a formal analysis of why reversibility 
occurs nor an indication of how widespread the problem is has been presented in the 

** Currently with ReasonEdge Technologies, Pte, Ltd, 438 Alexandra Road, t^ 03-01A 
Alexandra Point, Singapore 119958, Republic of Singapore, mjdruzdzel@reasonedge.com. 
^ An extended version of this paper is being submitted to the Journal of Artificial Intelli- 
gence Research. 
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causality literature. For these reasons the problem of reversibility has been widely 
ignored by researchers in causality.^ 

In this paper, we identify a class T of recursive equilibrium models that are 
guaranteed to violate the Manipulation Postulate; and in more complicated ways 
than merely reversing arcs under manipulation. Rather than relying on examples 
to demonstrate the existence of this class, this work is unique in that it provides a 
mathematical proof that fy 0 based on the existence of dynamic (time-dependent) 
models that possess recursive equilibrium counterparts. We show that the set of mod- 
els which belong to T is surprisingly large, encompassing a wide array of the most 
common physical systems. We also show that every model in T displays reversibility, 
thereby providing a mathematical basis for this phenomenon and a set of sufficient 
conditions for it to occur, while at the same time indicating that it is a more general 
and perhaps widespread problem than previously suspected. 

Our proofs rely on the results of Iwasaki and Simon [5] who apparently were the 
first to discuss the relationship between dynamic causal models and recursive equi- 
librium causal models. However, there has been other work relating dynamic models 
to non-dynamic models in general: Fisher [2] discusses the relationship between a 
time-varying model and its time-averaged counterpart; Kuipers [6] discusses tempo- 
ral abstraction in dynamic qualitative models with widely varying time-scales; and 
Richardson [8] discusses the relationship between independencies in dynamic models 
and in non-recursive equilibrium causal models. Due to space limitations, some proofs 
are only sketched below; however, full proofs are available in an online appendix at: 
http://www.sis.pitt. edu/ ~ ddash/papers/ caveats/ appendix.ps. 

We will use the following notation throughout the remainder of the paper: If 
G = {V,A) is a directed graph with vertex set V and arc set A, we will use Pa(n)G 
and Ch(n)G to denote the parents and children, respectively, in G, for some v G V. 
We will use Anc(n)G and Des(n)G to denote the ancestors and descendants of a 
variable v in graph G. If e is an equation then we use Params{e) to denote the set 
of variables contained in e. If if is a set of equations, we use Params{E) to represent 
UeGB Params(e). 

2 Causal Models 

We are considering causal models, in the form of structural equation models, whereby 
a system is summarized by a set of feature variables V, relations are specified by a 
set of equations E which determine unique solutions for all v G V, and each variable 
S R is associated with a single unique equation e G E: 

Definition 1 (total causal mapping). A total causal mapping over E is a bijection 
: V E, where E is a set of n equations with V = Params(if). Obviously can 
be written equivalently as a list of associations: {(ui, ei), (u2, 62), ..., (u„, e„)}. 

The notion of a set of equations being “self-contained” is defined precisely in [9] 
and [5]. Roughly the term means that the set of equations are logically independent 
(no equation can be derived by other equations in the set) and all parameters are 
identifiable. We will use the terms “structural equation model” and “causal model” 
interchangeably: 

^ Galles and Pearl [3] and subsequently Halpern [4] prove a theorem which they label 
“reversibility”; however this concept of reversibility has nothing to do with our concept. 
In particular, their theorem assumes that the Manipulation Postulate holds. 
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ei: fi{w, Z, 1), x) = 0 
62 : X = Xq 

63: h{x,y, 2) = 0 
er- h(v,x) = 0 
65: fi(v) = 0 



(mapped to w) 
(mapped to x) 
(mapped to z) 
(mapped to y) 
(mapped to v) 



Fig. 1. An example causal model. 

Definition 2 (structural equation model). A structural equation model M is a 
triple M = (V,E, ), where E is a self-contained set of equations over parameters V, 
and : V ^ E is a total causal mapping. 

A structural equation model can be used to represent a joint probability distribution 
over the variables by including in each equation dependence on an independent ran- 
dom variable that represents the external, non-modeled factors that may introduce 
noise into the system. It is sufficient for our purposes to consider only models such 
that any equation e £ E can be freely inverted for any variable v € Params(e) so that 

V can be written as a function of the remaining parameters of e, e.g., v = /(Pa(u)). 
An example of such a model is shown in Figure 1. Such a causal model defines a 
directed graph G by directing an edge, p — *■ u, for each p G Params{e) \ {u}. 

It will be sufficient for the purposes of this paper to consider recursive models 
only: 

Defiuitiou 3 (recursive causal model). A causal model M = {V,E, ) with a 
causal graph G is recursive if and only if G is acyclic. 

The following lemma shows that if M is a recursive model, then there exists exactly 
one mapping from equations to variables: 

Lemma 1. If M = (V, E, ) is a recursive structural equation model then any causal 
mapping ' : P — > if must he identical to : i.e., {v) = '{v) for all v £V . 

Proof, (sketch) This can be proven by induction by ordering the variables according 
to the topological sort of the graph, and showing for any mapping that if all the 
parents of a variable x are assigned according to then x must be also. The base 
case corresponds to an exogenous variable xq which must be assigned to (xq) since 
that equation must have xq as its only parameter. □ 

Causal inference may require the structure of the causal graph to be altered prior 
to performing probabilistic inference; in particular, it is made possible by a criti- 
cal postulate which we call the Manipulation Postulate. All formalisms for causal 
reasoning take the Manipulation Postulate as a fundamental starting point: 

Postulate 1 (Mauipulatiou Postulate) If G = (V, E) is a causal graph and V' C 

V is a subset of variables being manipulated, then the causal graph, G' , for the ma- 
nipulated system is such that G' = {V,E'), where E' C E and E' differs from E by 
at most the set of arcs into V' . 

In plain words, manipulating a variable can cause some of its incoming arcs to be 
removed from the causal graph, but can effect no other change in the graph. We 
say that a manipulation on v is perfect if all incoming arcs are removed from v 
in the manipulated graph. For the duration of this paper we will assume that all 
manipulations are perfect. This postulate is related to the well-known “do” operator 
of Pearl [7] in that a perfect manipulation on a system specified by a causal graph 
G will be correctly modelled by applying the do operator to G if and only if the 
Manipulation Postulate holds. 
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Manipulation inferences require only graphs (for qualitative inference), and maybe 
probability distributions (for quantitative predictions) . This fact makes common tools 
used in causal modeling, for example causal discovery, useful from a causal inference 
perspective. It allows us to learn a causal graph from data and feel confident that 
such a graph can be used to predict the effects of manipulation, without detailed 
knowledge of equations underlying the graph. It is this fact which makes the Manip- 
ulation Postulate so important, because without it a causal graph and a probability 
distribution would not be sufficient to allow manipulation inferences. 

3 Violating the Manipulation Postulate 

Druzdzel [1], and Spirtes et al. [10] have pointed out that, contrary to the Manip- 
ulation Postulate, some systems appear to exhibit reversibility when manipulated. 
The standard example of a reversible system is the transmission of a bicycle. In 
normal operation, the rotation rate of the pedals is fixed and the wheels rotate in 
response, the following causal graph describes this system: Pedal Rotation Rate 
Wheel Rotation Rate; however, if the bike is propped up on a bike rack and the wheel 
is directly rotated at some rate, then the pedals will rotate in response. The causal 
ordering of the system under these circumstances yields: Wheel Rotation Rate — > 
Pedal Rotation Rate. The mere citing of physical examples, however, is not a com- 
pletely satisfying demonstration that a correctly modeled system can violate the 
Manipulation Postulate. For example, perhaps there are hidden variables at play in 
our examples that, once included into the model, will produce a model that does 
not violate the Manipulation Postulate. Here we provide examples of systems which 
appear to violate the Manipulation Postulate in ways other than merely flipping arcs 
between manipulated children and their parents, suggesting that the problem of re- 
versibility is a more general problem than originally supposed. All examples in this 
section possess recursive graphs, thus according to Lemma 1 their causal mappings 
are unique. 

Reversibility is especially troubling from the point of view of automated causal 
discovery. It appears that manipulation inferences are possible only on models for 
which we have a strong understanding of the domain in the form of equations. Un- 
fortunately, after learning a causal model from data, the only knowledge we have 
typically consists of an automatically discovered graph along with an automatically 
discovered probability distribution. 



The Ideal Gas System Figure 2 displays one of the simplest physical systems. This 
system is comprised of an ideal gas trapped in a chamber with a movable piston, on 
top of which sits a mass, m. The temperature, T, of the gas is controlled externally 
by a temperature reservoir placed in contact with the chamber. Therefore, m and 
T can be controlled directly and so will be exogenous variables in our model of this 
system. 





Fb = mg JP = Fb/A 
T = Tq h = nIi.T/PA 



Fig. 2. Causal model of the ideal gas in equilibrium. 
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The equations presented in Figure 2 assume that the system is in equilibrium. 
That is, in a hypothetical experiment where m and T are set to some arbitrary 
values, there is an implicit time delay in measuring the remaining variables sufficient 
to allow all time- variation in their values to stabilize. Figure 2 shows the causal graph 
given by constructing a causal mapping for this system. In words: “In equilibrium, 
the force applied to the bottom of the piston must exactly balance the mass on top of 
the piston. Given the force on the bottom of the piston, the pressure of the gas must be 
determined, which together with the temperature determines the height of the piston 
through the ideal gas law. ” 

Consider what happens when the height of the piston is set to a constant value: 
h = ho. Physically this can be achieved by inserting pins into the walls of the chamber 
at the desired height, as shown in Figure 3. Applying the Manipulation Postulate to 
the model in Figure 2 yields the graph with the arcs P ^ h and T ^ h removed, as 
depicted in Figure 3 (b). 
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Fig. 3. The ideal gas model violates the Ma- Fig. 4. No equilibrium model exists after 
nipulation Postulate when h is manipulated. manipulating Ff,. 



What is the true causal graph for this system? Fortunately since this is a simple 
system which we understand well, we are able to write down the governing equations, 
given in Figure 3(c). Constructing the causal mapping (unique by Lemma 1) for 
these equations yields the graph shown. In words: Since h and T are both fixed, P 
is determined by the ideal gas law, P = kT/h. Since the gas is the only source of 
force on the bottom of the piston, Fj, is determined by P: Ff, = PA. Thus, P is no 
longer determined by Fh, and Ff, becomes independent of m. It is clear that the true 
causal model differs from that predicted by the Manipulation Postulate. Furthermore, 
although some arcs have been reversed in the graph, one has been deleted (m — > Ff,) 
and another has changed in an apparently arbitrary fashion (T —f h changed to 
T ^ P). This causal graph is exactly the one that would be learned from data using 
the manipulated system to generate the data, as can be verified by calculating the 
independencies between variables using the equations of Figure 3 (c) with independent 
error terms. 

There are other, even more dramatic problems with manipulating variables in 
this model. Refer back to the original ideal gas model of Figure 2. Imagine that for 
some reason we want to minimize h; it would not be unreasonable, given the graph in 
Figure 2, to set the value of h by applying a manipulation to Ff,, since Ff, is a causal 
ancestor of h. In particular, in order to make h as small as possible, we would want 
to make Ff, as large as possible according to Figure 2. 

Consider what happens when Ff, is manipulated in this way. In the real system, 
the force on the bottom of the piston can be set independently of the mass by raising a 
movable stage up through the chamber and directly applying the desired force to the 
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piston with the stage, as shown in Figure 4 (a). Something very unexpected happens 
under this manipulation. Rather than getting the model of Figure 4 (b), expected by 
the Manipulation Postulate, unless by coincidence the force applied exactly balances 
the force due to the mass, the piston will continually be accelerated out of the cylin- 
der, and h, which we intended to minimize, instead grows without bound. Not only 
does this manipulation violate the postulate, but even worse, we have discovered a 
dynamic instability in the system, i.e., there is no equilibrium model; a fact which 
a causal graph alone provides no indication of. If this example seems exaggerated it 
is only because we have some concrete understanding about the equations underly- 
ing this system. However, imagine applying manipulations to automatically learned 
models of complex socio-economic or medical systems, where our basic knowledge is 
at most fragmentary. Performing manipulations on such models could have unpre- 
dictable effects, to say the least. 

4 Dynamic Causal Models 

Manipulating the force in the ideal gas model led to an instability. This effect gives us 
a clue as to what is happening, namely, underlying the equilibrium ideal gas model is 
a dynamic system. When certain manipulations are made, this dynamic system may 
not possess an equilibrium point; the result is the hidden instability discovered in the 
ideal gas system. To understand the phenomenon, we must first discuss how to model 
this system on a finer time scale. 

The issue of modeling a causal system on varying time scales and relating models 
on those time scales was addressed by Iwasaki and Simon [5] . The key points that we 
take from their work are the following: (1) It is possible to model dynamic systems 
on many different time scales, (2) The causal graphs will not necessarily be the same 
for different time scales, and (3) The causal models based on shorter time scales can 
be used to derive models on longer time scales by applying the equilibration operator. 

Consider again the experiment we performed in Section 3. After we dropped a new 
mass on the piston and changed T, we waited some length of time for the piston to 
come to rest, then measured all of our variables. In this experiment, on the contrary, 
we will begin measuring our variables some time t after we have dropped the mass 
on the piston. If we repeated this experiment several times we would find that the 
independencies and the equations governing this dynamic behavior will in general be 
entirely different from those in equilibrium. 

Structural equation models were used in [5] to handle time-dependent systems by 
modeling the system at fixed, discrete time intervals. This is accomplished by creating 
new variables for each time slice, and adding differential equations that may relate 
variables across time slices. From a modeling perspective, time-dependent models and 
graphs are thus no different in principle from equilibrium structural equation models. 
Finding a causal mapping over these sets of equations would again define a directed 
acyclic graph (in the recursive case), where some arcs might go across time slices. 

We will illustrate the features of this technique by presenting the dynamic causal 
model of the ideal gas system. There are four physical laws: (1) Weight of a mass: 
Ft = mg, (2) Newton’s second law: iFi = ma, (3) the Ideal gas law: P = kT/h, 

and (4) the Pressure- force relationship: P = Ft,/ A, where a is the acceleration of the 
piston and all other variables are as defined in Figure 2. In addition to these physical 
laws, the system is constrained by the definition of acceleration and velocity of the 
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piston (expressed in discrete form): 

vt^t) = V{t-i) + a(t-i) t and \t) = + V(t-i) t 

where we have used the notation that refers to the value of variable x at time slice 
t, and t is the (constant) time between slices. In order to specify a particular solution 
to these difference equations, initial conditions must be given for h: /i(qj = /iq and for 
v: V(o) = Vo, where hg and vq are constants. Finally, since m and T are exogenous, 
we have m(t) = mg and Tp) = Tq, for all t. 

This model relates all the variables in our model at t = 0 with each other and with 
V and h a,t t = 1. Since /i(i) and r!(i) are now determined at t = 1, we can recursively 
iterate this procedure to generate causal graphs for arbitrary values of t. 

Since this graph is Markovian through time i.e., the variables in the future are 
d-separated from variables in the past by variables in the present, it can be repre- 
sented by a convenient shorthand graph for an infinite sequence of time steps. In this 
shorthand graph temporal subscripts can be dropped and we use special dashed links, 
labelled integration links [5], to denote that a causal relationship is really occurring 
through a time slice. The shorthand dynamic causal graph for the ideal gas system 
is shown later in Figure 5 (a). Since these shorthand graphs are based on differential 
equations, they always make the assumption that if x and x are present in the model 
then X ^ X across time slices. 



4.1 Deriving Equilibrium Models from Dynamic Models 

The dynamic graph in Figure 5 (a) represents the causal graph for the system modelled 
over an infinitesimal time scale; whereas, the graph from Figure 2 is modelled over 
a time scale that is long enough for the system to come to equilibrium. Here we 
formally define dynamic models and we review how to use the equilibration operator 
to derive an equilibrium model from the dynamic model. We will use the notation 
that V = dv/dt and that = v and tib+i) ^ dv''^^ /dt. 

The shorthand dynamic graph presented in Figure 5 (a) adds some confusion to 
the concept of recursivity, since it possesses cycles itself although it really is meant 
to represent an acyclic graph that is unrolled in time. Thus to clear up confusion we 
generalize the concept of recursivity for a shorthand graph: 

Defiuitiou 4 (recursive causal model). A dynamic causal model M = {V,E, ) 
with a causal graph G is recursive if and only if the causal graph obtained by 

removing all integration links from G, is acyclic. 

Defiuitiou 5 (dyuamic variable). Given a causal model M = {V, E, ) with graph 
G, a variable v G V is a dynamic variable if and only if v G Pa(z;)G. 

The operation of equilibration was presented in Iwasaki and Simon [5] whereby 
the derivatives of a dynamic variable x are eliminated from a model by assuming that 
X has achieved equilibrium: 

Defiuitiou 6 (Vdei(x), Edei(x)). Let M = {V,E, ) be a causal model with x gV 
and with G V the highest order derivative of x in the model, then: 

Vdei{x) = I 0 < i < n, * yf 0} and Edei{x) = { (x^*^) | 0 < z < n} 
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Fb Ft Fb Ft Fb = Fo Ft 




(a) (b) (c) 



Fig. 5. (a) The dynamic ideal gas causal graph, (b) Manipulating h, (c) Manipulating Fb- 
Note that x ^ Vdei{x) and ^ Edei{x). 

Definition 7 (equilibration). Let M = {V,E, ) be a eausal model and let x € V 
he a dynamie variable with S V the highest order derivative of x in V . The model 
Mx = {Vx,Ex, x) due to the equilibration of x is obtained by the following proeedure: 

1. LetVx = V\Vdei{x), 

2. Let Ex = E \ Edei{x), 

3. For each e & Ex set v = 0 for all v C Vdei{x). 

4-. Construct a new mapping j ^ Ex- 

Equilibration is equivalent to assuming that a dynamic variable x has achieved equi- 
librium. This implies that all of x’s derivatives will be zero. Equilibration can cause 
the remaining set of equations to be non-self-contained. We call equilibration well- 
defined if this does not happen. 

Definition 8 (equilibrium model). A causal model M = {V,E, ) is an equilib- 
rium model with respect to x for some x G V if and only if x is not a dynamic variable 
in M. 

Definition 9 (equilibrated model). A causal model = (14, Ej, j) is an equi- 
librated model with respect to x if and only if is derived from a dynamic model 
M = (y, E, ) by performing a well-defined equilibration onx € V, and x is a dynamic 
variable in M . 

4.2 Manipulating Dynamic Models 

We now examine the phenomena observed in the ideal gas system from the viewpoint 
of dynamics. Let us again fix the height of the piston, using the model of Figure 5 (a) 
to describe the ideal gas system. To fix the piston, we must set h to some constant 
value for all time, h(^t) = ^o- We also must stop the piston from moving so we must set 
rip) = 0 and ap) = 0. Thus, in the dynamic graph with integration links, we can think 
of this one action of setting the height of the piston as three separate actions. If we 
assume that the Manipulation Postulate holds on the dynamic model in Figure 5 (a), 
we obtain the graph depicted in Figure 5 (b). Since h is being held constant, this graph 
is already an equilibrium graph with respect to h (i.e., no equilibration operation is 
required). By comparing Figure 5 (b) to the manipulated equilibrium ideal gas system 
of Figure 3 (c), we can see that aside from the extra variables that were added to the 
dynamic model for clarity (Fj, a and v), Figure 5(b) is identical to the expected 
manipulated model. Therefore, the Manipulation Postulate holds for this model, and 
it produces precisely the graph that we originally expected to get but were unable to 
get from the equilibrium model. 

Dynamic models can also be used to predict when a manipulation will cause 
an instability. In order to demonstrate this, we first need to review a key result 
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about stability in dynamic systems. If, within a dynamic model, a dynamic variable 
X possesses a fixed-point solution at, say, x = Xq, then that fixed-point will be a stable 
fixed-point if and only if the following stability relation [11] holds: 

if Lo ^ (^t^bility condition) 

where x is the time-derivative of x. 

According to this stability condition, the variable x must somehow be a function 
of X for stability to occur. What does this imply about dynamic causal models? In 
order for stability to occur, there must exist some regulation process by which xp) can 
get information about X(t<) for some t' < t. In our dynamic model, for example, this 
regulation takes place through the feedback loop: /i(t) P ^ Ft ^ a(^t) 

The stability condition thus suggests a structural condition for stability in a causal 
graph: 

Definition 10 (The Structural Stability Principle). Let G be a eausal graph 
with dynamic variable v, and let Fb(u) denote the set Fb(u) = Auc(u)g H Des(u)G, 
then V will possess a stable fixed-point only if Fh{v) 0. 

Consider the implications of manipulating Ff, in the dynamic model of the ideal 
gas system. If we again assume that the Manipulation postulate holds for the dynamic 
model, when is manipulated in Figure 5(a), the model shown in Figure 5(c) is 
obtained. We can see immediately from the causal graph that this manipulation 
will break the only feedback loop for x in this system, and thus according to the 
Structural Stability criterion, there does not exist a stable equilibrium point for this 
model. Our second major observation is therefore that the dynamic model, together 
with the Manipulation Postulate and the Structural Stability criterion correctly predict 
that some manipulations will cause an instability. 

5 Theorems 

In this section we formalize the observations suggested by the examples in Section 3. 
For the remainder of this section, let M = {V, E, ) be an arbitrary dynamic causal 
model, let X G M be a dynamic variable in M and let = (Vj,Fj, j) be the 
causal model obtained by performing a well-defined equilibration operation on x. Let 
G and Gj be the causal graphs for M and Mj, respectively and G^x^ be the graph 
corresponding to G with all of x’s integration links removed. We define Fb(x) to be 
the set of feedback variables: Fb(x) = {Anc(x)c H Des(x)c}, and let Vdei{x) and 
Edei{x) be defined as in Definition 6. 

Definition 11 (RFRE Model, F). Mjr is a recursive feedback-resolved equilibrated 
(RFRE) model with respect to x if and only if the following conditions hold: 

1. Equilibration: Myr is derived from a dynamic model Md by equilibrating x in 
Md, 

2. Recursivity: Myr and Md are both recursive, and 

3. Feedback-resolution: 

{Fb(x)\ Vdel{x)}FCh{x)G,^%. 

We denote the class of all RFRE models as F , and use F(x) to denote the set of 
RFRE models with respect to x. 
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Lemma 2. If M is recursive, then there exists an ordering relation O on the asso- 
ciations of such that: 

1. 0{{vi,ef)) < 0{{vj,Cj)) if Vi € Anc(t;j)g,^o), and 

2. the pairs corresponding to Fb(a;) form a contiguous sequence in O. 

Proof. In all such that i n are exogenous by construction (they are speci- 
fied by the initial conditions in the model) . Thus they can be ordered before all other 

V e Fb(x). Define Anc(Fb(x))g,m = U„gFb(a;) Des(Fb(x))gm = 

U„gFb(a:) Des(f)g(o) to be the set of ancestors and descendants, respectively of Fb(x). 
By transitivity of the ancestor and descendant relationships, if there exists a x S 
Anc(Fb(x)) n Des(Fb(x)) then v G Fb(x). Thus an ordering can be defined such 
that O(vanc) < O(vfb) < O(vdes) for arbitrary variables Vanc G Anc(Fb(x)) \ Fb(x), 
Vdes G Des(Fb(x)) \ Fb(x), and Vft G Fb(x). □ 

Lemma 3. Let F denote the set Vj \ {Fb(x) U {x}}. If M and are recursive then 
x(v) = (v) for all V G F. 

Proof, (sketch) Using Lemma 2 and a recursive proof similar to that of Lemma 1, it 
can be proven that it will always be possible to define a mapping ' such that each 

V G F gets mapped to (x') for some x' C F. It then follows by Lemma 1 that since 

2 is recursive, ' = s. □ 

The next lemma says, informally, that all ancestors of x in Fb(x) that are not 
dynamic variables in must pass through x^^: 

Lemma 4. The following relation holds: Fb(x) \ Vdei(x) C Anc(x^”^) „(o) . 

Proof. First note that if x is a dynamic variable, then in G^\ by construction v must 
be given by initial conditions and so must be exogenous. Therefore, in the chain of 
derivatives: x*^"^ ^ xfo”^^ — > • • • ^ x, all x^®^ such that i ^ n must have a single 
parent which is connected by an integration link. Therefore, all v G Anc(x)c \ Vdei(^) 
must be ancestors of x^\ i.e., Fb(x) \ Vdei{x) C Anc(x*^”^)g(o) . □ 

Lemma 5. If G iF{x) then there does not exist an x^*^ such that x^*^ G Ch(x)G. 

Proof, (sketch) First note that the result follows for all such that j < n, because 
by construction Pa(x*^-^^) = in M. Thus we only need to prove that x^^ ^ 

Ch(x). is recursive by assumption; therefore, by Lemma 1 there only exists one 
causal mapping, j. However, if x^^ G Ch(x) then it can be shown by Lemma 3 
that there exists a mapping ' such that '(x) = (x^^^), and all other variables in 

Vj retain the associations specified by . By Lemma 4 it follows in such case that 
Anc(x) n Des(x) is non-empty, which contradicts the recursivity of j. 

□ 

Lemma 6. If e lF{x), then there exists a v G Vs such that v G Pa(x)G* and 
such that V G Ch(x)c. 

Proof. Define an ordering O for and label the pairs {vi,ei) in according to O 
as in the proof of Lemmas 1 and 3. Let (x, e^) be the association for x in j. By 
construction x Vi, and by Lemma 3, vt G Fb(x). Since x G Params{ei) and since 
{vi,Ci) G it must be the case that Vi G C1i(x)g. Since x^^^ is exogenous in G^^ for 
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all I ^ n and since, by Lemma 5, Vi ^ v^'^\ it follows that Vi ^ Vdei{x). Therefore 
Vi G Fb(a;) \ Vdei(a;), and since Vi G Params(ei) it must be the case that Vi G Pa{x)G^- 
□ 

Lemma 7. If G lF(x) and M± = {Vx,Ex, x), with causal graph Gx, is the causal 
model resulting when x is manipulated in M , then in G± there will exist an edge x v 
for all V G Ch(a;)G H 14 . 

Proof. Since M obeys the Manipulation Postulate, the only arcs that will be removed 
from M when x is manipulated will be the arcs coming into x and into x’s derivatives 
x^"^\ Since by Lemma 5, x is not a parent of any the children of x must be 
preserved in Gx- □ 

Finally, Theorem 1 presents conditions which are sufficient for Mx to violate the 
Manipulation Postulate. 

Theorem 1 (reversibility). If Mx G lF{x) and the Manipulation Postulate holds 
for M, then the Manipulation Postulate does not hold for Mx- 

Proof. Manipulating x in M produces an equilibrium model with respect to x, Mx, 
which must be the correct model that is obtained when x is manipulated, by definition 
of the Manipulation Postulate. Let Gx be the causal graph corresponding to M±. Since 
Mx G lF{x), by Lemma 6 there exists a u G Ch(x)G such that u — > a; in Gx', however, 
according to Lemma 7, the edge a; — > u must exist in Gj,. Thus, manipulating x in Gx 
by applying the Manipulation Postulate leads to an incorrect graph Gx\x, because it 
will not contain an edge between v and x. □ 

The theorem is labeled “reversibility” because its proof relies on the guaranteed 
reversal of an arc; nonetheless, it is clear by the examples given in Section 3 that there 
is more complex behavior being exhibited in these systems than mere reversibility. 

The last theorem proves that hidden dynamic instabilities are a mathematical 
feature of some equilibrium causal models: 

Theorem 2 (instability). If G lF{x), the Manipulation postulate holds for M 
and the Structural Stability condition holds then there exists a set of variables V C 14 
such that if V is manipulated in M , the variable x will become unstable. 

Proof. Define V = Fb(x) \ Vdei{x). It must be the case that P' yf 0 by definition of 
J-{x). According to the Manipulation Postulate, manipulating V in G will create a 
new graph Gy, with Fb(a:)G^;, = 0. Therefore, according to the Structural Stability 
principle, x will be unstable in Gy, . □ 



6 Discussion 

We have tried to emphasize the severity of our conclusions on the practice of causal 
discovery from equilibrium data. Because the examples we have presented are based 
on simple systems about which most readers are likely to have a good general un- 
derstanding, the consequences of violating the Manipulation Postulate may not be 
fully appreciated. However, in domains where causal discovery procedures are used to 
elicit causal graphs from data, typically little or no background knowledge is present. 
After discovery, therefore, all knowledge that the modeler possesses is in the form of 
a causal graph and maybe a probability distribution. The theorems presented in this 
paper shed significant doubt on the usefulness of a graph so obtained for performing 
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causal reasoning, because we would have no knowledge of the dynamics underlying 
this system. One obvious remedy is to use time-series data to learn dynamic causal 
graphs instead of equilibrium models when causal inferences are required. What then 
is the minimal information needed to insure that a model will support manipulation? 
Are there general relationships between dynamic models and equilibrium models that 
can allow us to answer these questions for arbitrary models? We believe these are 
hard questions but whose answers would be of significance to future work in causal 
reasoning. 
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Abstract. The term “changes in structure,” originating from work in 
econometrics, refers to structural modihcations invoked by actions on a 
causal model. In this paper we formalize the representation of reversibility 
of a mechanism in order to support modeling of changes in structure in 
systems that contain reversible mechanisms. Causal models built on our 
formalization can answer two new types of queries: (1) When manipulat- 
ing a causal model (i.e., making an endogenous variable exogenous), which 
mechanisms are possibly invalidated and can be removed from the model? 
(2) Which variables may be manipulated in order to invalidate and, effec- 
tively, remove a mechanism from a model? 



1 Introduction 

Graphical probabilistic models, such as Bayesian networks, provide compact and 
computationally efficient representations of problems involving reasoning under 
uncertainty. Users can easily update their belief in the states of a modeled system 
by setting evidence in a model that reflect observations made in the real world. 
A related formalism of causal models, based on structural equations, in addition 
to observations, supports prediction of the effects of actions, i.e., external manip- 
ulation of modeled systems. Explicit representation of causality in causal models 
enables users to predict the effects of actions, which in turn allows users to perform 
counterfactual reasoning [8,12,16]. 

The problem of predicting the effects of actions was originally referred to in 
econometrics literatures as predicting the effects of changes in structure in simul- 
taneous equation models. Assuming that a modeler has sufficient prior knowledge 
to predict the effects of changes in structure, researchers in econometrics modeled 
the effects of actions as “scraping” invalid equations and “replacing” them by new 
ones [10,13,17,18]. If we assume that the variable manipulated by an action is gov- 
erned by an irreversible mechanism (for example, wearing sunglasses protects our 
eyes from the sun but it does not make the sun go away), the effect of an action 
amounts to an arc-cutting operation on the causal graph describing the situation 
[12,16]. However, there exist a large class of reversible mechanisms [4,12,13,15,16, 
19,18] that are not amenable to this treatment. For example, a car engine causes 
the wheels to turn when going up hill, but wheels slow down the engine when going 
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down hill with transmission being put in a lower gear. An action may reverse the 
direction of causal relations among variables and consequently have drastic effects 
on causal graphs. 

There have been attempts to assist in predicting the effects of actions on sys- 
tems containing reversible mechanisms. Bogers [1] developed theorems to support 
structure modifications when the equation being scraped by an action governs 
an exogenous variable. Druzdzel and van Leijen [6] studied the conditions under 
which a conditional probability table in a causal Bayesian network can be reversed 
when manipulating a reversible mechanism. Dash and Druzdzel [3] demonstrated 
how various equilibrium systems may violate the arc-cutting operation and fur- 
ther developed differential causal models to solve the problem by modeling systems 
dynamically. 

Our approach to supporting changes in structure is based on our representation 
of reversibility of a mechanism. A mechanism asserts that there exists a relationship 
among a set of variables. We define the reversibility of a mechanism semantically 
on the set of possible effect variables of a mechanism. A set of mechanisms is a 
causal model only if the causal relations among the variables are consistent with 
the reversibility of its mechanisms. Similarly to STRIPS language [7], we concep- 
tualized an action as consisting of three lists: PRECONDITION (a causal model), 
ADD (the set of mechanisms to be added), and DELETE (the set of mechanisms to 
be removed) . Consequently, once an action is completely specified, the effect of an 
action is simply performing the modifications specified in ADD and DELETE lists 
on the causal model given in a PRECONDITION . Given the PRECONDITION 
and one of the ADD or DELETE lists of a partially specified action, we proved 
two theorems to assist modelers in answering two new types of queries: (1) When 
manipulating a causal model, which mechanisms are possibly invalidated and can 
be removed from the model? (2) Which variables may be manipulated in order to 
invalidate and, effectively, remove a mechanism from a model? As an extension 
of existing approaches [1,3,12,16], we formalize the representation of reversibility 
of a mechanism and assist modelers in predicting the effects of actions in systems 
consisting of mixtures of mechanisms. 



2 Structural Equation Models and Causal Ordering 

The work in simultaneous equation models (SEMs) is the root of the work on 
graphical causal models [8,12,16]. Given an equation e, we denote the set of vari- 
ables appearing in e as Vars{e). The set of variables appearing in a set of equations 
E is denoted as Vars{E) = Vars{e). A structural equation model can be de- 

fined as a set of structural equations E = {ci, C2, . . . , Cm} on a set of variables 
V = {vi,V 2 , ■ ■ ■ ,Vn} appearing in E, i.e., V = Vars{E). Each structural equation 
Ci € E, generally written in its implicit form ei{vi,V 2 , ■ ■ ■ ,Vn) = 0, describes a 
conceptually distinct mechanism active in a system.^ A variable Vj G E is ex- 
ogenous if it is determined by factors outside the model, i.e., if there exists a 

^ Every structural equation normally contains an error term to represent disturbance 
due to omitted factors. We will leave out error terms for the simplicity of exposition. 
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structural equation €i{vj) = 0 in i?. A variable is endogenous if it is determined 
by solving the model. We denote the set of exogenous and endogenous variables 
in E as ExVars{E) and EnVars{E) respectively. E is independent if there is no 
6i G E such that is satisfied by all simultaneous solutions of any subset of 
E\ei- E is consistent if the solution set of E is not empty. In order to ensure that 
E is independent and consistent, Simon and Rescher [15] defined the concept of 
structure: 



Definition 1. A structure is a set of equations E where \E\ < \ Vars{E)\ such 
that in any subset E' C E: (1) \E'\ < \ Vars{E')\, and (2) If the values of any 
\Vars{E')\ — \E'\ variables in Vars(E') are chosen arbitrarily, then the values of 
the remaining \E'\ variables are determined uniquely. 

A SEM E is self-contained if if is a structure and |if| = \V\. E is under- 
constrained if if is a structure and jifj < \V\. E is over-constrained if E is not a 
structure. Whenever jifj > \V\, E is over-constrained. In general, we use a self- 
contained SEM to describe an equilibrium system since the set of equations is 
consistent and independent, and the values of variables are determined uniquely. 
A self-contained structure E is minimal if it does not contain any proper subset 
of equations in E which is self-contained. A minimal self-contained structure is 
a strongly coupled component if it contains more than one equation. A set of 
equations E can be represented qualitatively as a matrix, called structure matrix 
[5,13,15], with element = x if Uj G E participates in Ci G if, where x is a 
marker, and atj = 0 otherwise (see Fig. 1). 



Wl V2 V3 V4 Vs ve V7 Va 

eixOOOOOOO 

62 0 x 0 0 0 0 0 0 

63 0 0 x 0 0 0 0 0 

64 0 x 0xx0 0 0 

65OXXXXOOO 

66 00x0xx00 

67 000x00x0 
6sx00000xx 




Fig. 1. COA takes a self-contained structure as input and outputs a causal graph. 



As shown by Simon [13], a self-contained structure exhibits asymmetries among 
variables that can be represented by a special type of directed acyclic graph and 
interpreted causally. He developed a causal ordering algorithm (COA) that takes a 
self-contained structure E as input and outputs a causal graph Ge = {N, A) where 
N = {Ni,N2,. ■ ■ ,Nr} is a partitioning of V, consisting of pairwise disjoint sets 
such that Ui=i A is a set of directed arcs v ^ Ni where v G V, Ni G 

N, and v ^ Ni. COA starts with identifying the minimal self-contained structures 
in E. These identified minimal self-contained structures, (7° = {C°, C® , . . . , C®}, 
are called complete structures of 0-th order and a partition on V is created for 
Vars(C^) for each C° G C°. For each variable v G N^, a corresponding node is 
created. When a minimal self-contained structure is a strongly coupled component. 
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i.e., \C^\ > 1, we draw the nodes created for variables in as overlapping circles 
because their values need to be solved simultaneously. Next, COA solves for the 
values of Vors(C'°) and removes from E. We denote the new structure E\C'^ 
as E^. COA then substitutes the solved values of Vars(C^) into E^ to obtain the 
derived structure of the first order E^. COA repeats the process of identifying, 
removing, solving, and substituting on the derived structure of p-th order until 
it is empty. In addition, whenever a partition Nf and corresponding nodes are 
created for a complete structure Cf. in the complete structures of p-th order, COA 
refers Cf. back to its equations before any substitutions in E, denoted as Cf., 
and add arcs from nodes representing variables in Vars(C^) \ Vars(C^) to the 
nodes representing Notice that COA creates one-to-one mapping, denoted as 

(C^, Nf), between the set of equations, C^, and the set of variables, Nf, in a causal 
graph. We say that is mapped to Nf or vice versa in Gb (see Example 1). 

Since the concept of endogenous and exogenous variables relative to the struc- 
ture before substitutions of a complete structure of p-th order [13] plays an im- 
portant role in the rest of the discussion, we introduce it formally as follows. 

Definition 2. Let and be complete structures of p-th and q-th order re- 
spectively in a self-contained structure E. Let Cf. be the structure before any sub- 
stitutions of a complete structure Cf. € in E and v € Vars(Cj^). We say that 
v is endogenous in C^, if v ^ Vars{C^) for all q < p, and v is exogenous in Cl, 
if v & Vars{C^) for some q < p. We denote the sets of endogenous and exogenous 
variables in Cl by EnVars{Cl) and ExVars{Cl) respectively. 

From Definition 2, we know that each variable u in a self-contained E can 
appear as an endogenous variable in only one G^. We define the necessary structure 
for u in A to support changes in structure defined in Sect. 4.2. 

Definition 3. Let Ge be the causal graph generated by applying COA to a self- 
contained structure E. Let v G Nf and Anc(Nf) be the ancestral set of Nf in Ge- 
The necessary structure for v, denoted as NSy, is the set of equations that are 
mapped to iV^ U Anc{N'^) by COA. 

It is easy to see that a necessary structure is self-contained. In other words, NSy 
consists of all equations in E that are necessary to determine v uniquely. 

Example 1 . In Fig. 1, COA takes the structure matrix as inputs and identifies G° = 

= {{ei},{e2},{e3}}, G^ = {{64,65}}, G^ = {{65}, {67}}, and G^ = {{6g}} 
to generate the causal graph. The mapping between equations and variables are 
(ei,i;i), (62, U2), (63, -63), ({64,65 },{v 4,U5}), (65, (67, V?) and {es,vs). From 
the causal graph, we may read off the causal relations among sets of variables. 
For example, {va,vq\ is caused by V2 and U3, vq is caused by U3 and V5, and vj is 
caused by U4. We may also read off indirect causal relations such as that U3 is an 
indirect cause of v-j. However, the causal relations between U4 and U5 are undefined, 
since they are in a strongly-coupled component. Notice that U4 is endogenous in 
C\ = {64,65} but exogenous in G| = {67}. The necessary structure for U4 is 

{62, 63, 64, 65}. 
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3 Reversible Mechanisms 

Like any other scientific modeling, structural equation modeling requires us to 
clearly relate our definitions of variables and structural equations in a SEM to a 
system in the real world. In general, we start with identifying entities involved 
in a system. An entity can be a single object (e.g., a patient), a population of 
similar objects (e.g., male patients in a hospital), or a group of relevant objects 
(e.g., patients, doctors, and insurance company in a health system). We then define 
variables to refer to characteristics of entities (e.g., age of a patient) and define 
structural equations to describe the linkages among variables (mechanisms) in 
the system. Our prior domain knowledge serves as a guideline in hypothesizing 
which mechanisms are involved in a system. Therefore, the definitions of structural 
equations and variables in a SEM are a-priori [13,18]. Simon [14] suggested three 
classes of sources for specifying mechanisms: experimental manipulation, temporal 
ordering, and “tangible” links. In [12,18], researchers stressed that mechanisms 
should be autonomous in the sense that the external change on any one of the 
mechanisms does not imply the change of others. For the purpose of illustration, 
we define mechanisms as follows. 

Definition 4. A mechanism e, represented as a structural equation 
e{v\,V 2 , ■ ■ ■ ,Vn) = 0, asserts that there exists autonomous linkages among 
the set of variables {vi,V 2 , ■ ■ ■ ,Vn}- 

Simon [14] further pointed out that different a-priori assumptions for one mech- 
anism may lead to different interpretations of causal relations among variables. 
For example, schooling helps to increase verbal ability in one experimental con- 
text, but verbal ability helps in getting higher schooling in another. He used the 
term causal mechanisms to refer to mechanisms considered under different a-priori 
assumptions. In other words, each causal mechanism represents a distinct theory 
that we hypothesized about the observation of a phenomena in the real world and 
is written as a function to explicitly describe the relation of the effect variable and 
its causes. 

Definition 5. Given a mechanism e, a causal mechanism, v = f{Pa{v)), de- 
scribes a function f between the effect variable v G Vars{e) and its direct causes 
Pa{v) = Vars{e) \ v. We say that v = f{Pa{v)) is instantiated from e. 

Generally, there may be more than one causal mechanism instantiated from 
a mechanism as long as the functions formalized are consistent with the a-priori 
assumptions. In practice, we believe that people tend to first express a causal 
mechanism qualitatively as a specification of the effect variable and its causes, and 
later give it an explicit function. Assuming that the number of variables appearing 
in a mechanism is fixed, the number of possible effect variables for a mechanism 
is finite. Consequently, we can classify mechanisms into four categories according 
to their reversibility: (1) completely reversible: every variable in the mechanism 
can be an effect variable, (2) partially reversible: two or more of the variables in 
the mechanism can be effect variables, (3) irreversible: only one of the variables 
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in the mechanism can be an effect variable, and (4) unknown', the reversibility 
of the mechanism is unspecified, i.e., the modeler only knows that variables in a 
mechanism are relevant, but does not know how they relate to each other causally. 

Definition 6 . Given a mechanism e, let EfVars{e) C Vars{e) he the set of all 
possible effect variables of all causal mechanisms instantiated from e. We say that 
e is (1) completely reversible if EfVars{e) = Vars{e) and \EfVars{e)\ > 1, (2) 
partially reversible ifl< \EfVars{e)\ < | Vars(e)|, (3) irreversible if \EfVars{e) \ = 
1, and (4) unknown if \EfVars{e)\ = 0. 

We emphasize that the notion of reversibility of a mechanism is a semantic one 
since it is defined with respect to the set of effect variables of a mechanism. A 
functional relation may be reversible in functional sense (invertible), but may not 
be reversible in causal sense [18, footnote 6]. For example, ideal gas law and Ohm’s 
law are given in [19, pp. 40] and [11, pp. 10] respectively as examples of partially 
reversible mechanisms, although their functional relations are invertible in general. 
Traditionally the reversibility of mechanisms is considered mainly applicable to 
mechanical and physical systems [19, pp. 325], since the concept is defined upon 
causal mechanisms, i.e., the invertibility of a function is a necessary condition for 
the reversibility. In our formalization, we define the concept of reversibility on the 
set of effect variables of a mechanism so that we can apply the reversibility to 
other domains. For example, it would be a mere coincidence that schooling, s, and 
verbal ability, a, can be described as s = /(a) in one context and a = f~^(s) in 
another. However, it is more likely that s = f{a) in one context and a = g{s), 
where g yf /“^, in another. 

Notice that the notion of entity plays an essential role in our modeling. We 
should not confuse the reversibility of a mechanism with causal mixtures [2] in 
which members of entities may not share the same causal relationships. For ex- 
ample, if the relation between schooling and verbal ability is modeled as a causal 
mixture, we may find that schooling helps to increase verbal ability in one subpop- 
ulation of students but verbal ability helps to getting higher schooling in another. 
However, reversible mechanisms model the same entities in different contexts. For 
example, the verbal ability helps some population of students to get higher school- 
ing in one context, but in another context the schooling helps the same students 
to increase their verbal ability. 

Taking the reversibility of mechanisms into account, we can define a causal 
model as follows. 

Definition 7. A causal model is a set of mechanisms E = {ci, 62 , ... , 6 ^} such 
that there exists a set of causal mechanisms F = {fi, f 2 , ■ ■ ■ , fm} instantiated 
from E, where each fi € F is an instantiation of Ci € E, and F is a self-contained 
structure. 

Given a set of mechanisms E, we can test if E can form a causal model by 
checking whether there exists a self-contained F instantiated from E. The pro- 
cedure, denoted as IsCausalModel(E) , first checks if \E\ = \Vars{E)\. If so, the 
procedure assumes that A is a self-contained structure and applies COA qualita- 
tively on E's structure matrix to generate the graph Ge. For each node in G^;, 
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the procedure checks if the mapped mechanisms have valid causal mechanisms to 
be instantiated, i.e., if there exists a causal mechanism whose effect variable is 
the same as the one depicted in Ge- If there exists a set of causal mechanism F, 
instantiated from E, whose effect variables are consistent with Ge, the procedure 
verifies that if is a causal model. In order to assist modelers in hypothesizing causal 
relations in a mechanism whose reversibility is unknown, the procedure treats its 
reversibility as completely reversible. Notice that for those E containing strongly 
coupled components, we may have several instantiations F from E. In other words, 
an irreversible mechanism cannot participate in a strongly coupled component. 

Example 2. Assume that the set of mechanisms E = {ei, 62 , . . . , eg} for the set 
of variables V = {vi, U2, • . • , ug} shown in Fig. 1 is stored in a knowledge base 
along with their causal mechanisms. In the knowledge base, eg and 67 are irre- 
versible where EfVars(ee) = {wg} and EfVars{er) = {vr}, 64 and eg are com- 
pletely reversible where EfVars{e 4 ) = Vars{e 4 ) and EfVars{ez) = Vars{ez), and 
eg is partially reversible where EfVars{e^) = {ui,ug}. Consequently, if is a causal 
model since there exists a self-contained structure F that can be instantiated 
from E. However, if in the knowledge base we have EfYarsie-j) = {^4} instead of 
EfVars{ey) = {uy}, then E is not a causal model since there is no instantiation of 
67 that can make any instantiation E of E self-contained. 

4 Actions in Cansal Models 

4.1 Representation of Actions 

Given a causal model that describes a system of interest, we may easily hypothesize 
different manipulations, such as “raise interest rate” or “reduce tax,” with the 
intention to influence the values of some target variables. Still, we may not know 
how other parts of the system may respond to these hypothetical manipulations. 
In other words, we suspect that our hypothetical manipulations will affect the 
variables of interest, which are the descendants of the manipulated variables in 
causal graph, but we are not certain how the equilibrium system will be disturbed 
by our hypothetical manipulations. Therefore, the process of policy making usually 
focuses on deliberating the side effects of a manipulation. How should we represent 
an action in causal modeling to facilitate this deliberation? 

Pearl [12, pp. 225] suggested to use the notation do{q), where g is a proposition 
(variable), to denote an action, since people use the phrases such as “reduce tax” 
in daily language to express actions. More precisely, an atomic action, denoted 
as manipulate{v) in [2,16] and do{v = v) in [12], is invoked by an external force 
or agent to manipulate the variable v by imposing on it a probability distribu- 
tion or holding it at a constant value, v = v, and replacing the causal mechanism, 
V = f{Pa{v)), that directly governs w in a causal model. The corresponding change 
in the causal graph is depicted as an arc-cutting operation in which all incoming 
arcs to the manipulated variable v are removed [12,16]. Notice that the implicit 
assumption behind the arc-cutting operation is that the manipulated variable is 
governed by an irreversible mechanism, i.e., only v can be an effect variable in 
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mechanism e(v, Pa(v)) = 0. In order to ensure that the manipulated causal model 
is self-contained, the irreversible mechanism that governed the manipulated vari- 
able has to be removed from the original model. However, when the manipulated 
variable is governed by a reversible mechanism, the arc-cutting operation may 
lead to inconsistent results. We therefore argue that an action in causal modeling 
should be defined at the level of mechanisms, not propositions. 

In econometric literature (e.g., [10,13,17,18]), a system is represented as a SEM, 
a set of structural equations, and actions are modeled as “scraping” invalid equa- 
tions and “replacing” them by new ones. In STRIPS language [7], a situation 
is represented by a state, conjunctions of function-free ground literals (proposi- 
tions), and actions are represented as PRECONDITION , ADD, and DELETE 
lists which are conjunctions of literals. There is a clear analogy between these two 
modeling formalisms, where the effects of actions are modeled explicitly as adding 
or deleting fundamental building blocks which are mechanisms in SEM and propo- 
sitions in STRIPS. We therefore directly translate the “scraping” into DELETE 
and “replacing” into ADD and define an action in causal modeling as follows. 

Definition 8. An action in causal modeling is a triple {PRECONDITION, ADD, 
DELETE) where PRECONDITION is a causal model E and ADD and DELETE 
are the sets of mechanisms to be added and removed from E respectively when 
applying action to E. 

We consider the context and the effects of an action explicitly in Definition 8. This 
is consistent with our daily dialogue where we talk about an action and its possible 
effects under a certain context. For example, the phrase “reduce tax” is usually 
stated in an economic context with some expectations about how economic units 
would react. 

Note that Definition 8 does not constrain us in what types of mechanisms 
and how many mechanisms can be specified in ADD and DELETE lists. There is 
also no guarantee that the manipulated model will be a self-contained structure. 
However, the atomic action defined in [12,16], which can be expressed explicitly 
as {{E},{v = v},{e{v,Pa{v)) = 0}) using our definition, always derives a self- 
contained structure. We use the term atomic addition, denoted as add{v), to refer 
to the ADD list of an action that consists of only one mechanism, {u = v}, which 
expresses the manipulation on variable v in E. We use the term atomic deletion, 
denoted as delete{e), to refer to the DELETE list of an action that consists of only 
one mechanism e in E. In order to account for systems with mixtures of different 
mechanisms, we say that an action is atomic if it consists of atomic addition and 
atomic deletion such that the manipulated model is self-contained. 



4.2 Action Deliberation 

Once we chose to represent an action explicitly including its effects and context, 
we shift the problem of predicting the effects of an action to which mechanisms 
should be specified in ADD and DELETE lists. We call the process of deciding 
which mechanisms should be in ADD and DELETE lists action deliberation. In 
this section, we develop theorems to facilitate the process of deliberating about 
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an atomic action. Given a causal model E, we seek to answer two new types of 
queries (1) When making an endogenous variable exogenous, which mechanisms 
are possibly invalidated and can be removed from the model? (2) Which variables 
may be manipulated in order to invalidate and, effectively, remove a mechanism 
from a model? In other words, Query (1) assists modelers in modeling the effects 
of an action considering the manipulation alternatives at hand. Query (2), on 
the other hand, assists modelers in identifying the set of possible manipulation 
alternatives. We start by defining the set of minimal over-constrained equations 
that describes the situation where an atomic addition is added into a model. 

Definition 9. A set of over-constrained equations is minimal if it does not contain 
any over-constrained proper subsets itself. 

Lemma 1. Let E he a self-contained structure and add{v) = {u = v} he an atomic 
addition where v € EnVars{E) . Let E!^ = add{v) U E. The set of equations Oy = 
NSyU add{v) is minimal over- constrained where NSy is the necessary structure of 
V in E. 

Lemma 1 states that an atomic addition makes a self-contained structure minimal 
over-constrained. Next, we prove Lemma 2 to identify the set of equations such that 
removing any one of them makes the set of minimal over-constrained equations 
self-contained again. 

Lemma 2. Given O'y of Ey, deleting any equation e € NSy makes Oy = O'y \ e 
self-contained and consequently Ey = Ey \ e self-contained. 

Corollary 1. Given Ey = add{v)\J E , Ey will remain over-constrained if none of 
equations e € Oy is removed. 

Example 3. Consider the self-contained structure E in Fig. 1. If we manipulate 
on variable v-j, i.e., add{v^), the resulting set of equations E'y^ = E U add{v'^) 
becomes over-constrained. From Lemma 1, we know that the set of equations 
Oy.^ = {62,63,64,65,67,0^^(^7)} is minimal over-constrained. From Lemma 2, we 
know that removing any equation e € {62,63,64,65,67} makes the remaining set 
of equations Ey.^ = Ey^ \ e a self-contained structure. If we instead remove 65, the 
set of equations E'y^ \ 65 remains over-constrained according to Corollary 1. 

Notice that Lemmas 1 and 2 hold for sets of equations. As stated in Sect. 3, a 
self-contained structure is not necessarily a causal model unless it can be instanti- 
ated from a set of mechanisms. Therefore, in order to deliberate about an atomic 
action in a causal model, we need to verify that the manipulated set of mecha- 
nisms is a causal model. In general, we can simply enumerate each mechanism 
6 G NSy and use the procedure LsGausalModel(Ey) outlined in Sect. 3 to check 
if the manipulated model Ey is a causal model. However, we observed that the 
irreversibility of mechanisms allows us to find the set of possible atomic deletions 
locally. 

Consider an atomic addition add{v) on a causal model E = {ei, 62, . . . , e^} 
and v G EnVars{E). When all mechanisms governing EnVars(NSy) in NSy are 
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completely reversible or unknown, we may remove any one of the mechanisms 
in NSy to have a manipulated causal model. When v is directly governed by 
an irreversible mechanism e, we have to remove e since v cannot be determined 
by add(v) and e simultaneously in a manipulated model. In other words, the 
reversibility of mechanism governing the manipulated variable shrinks the set of 
possible atomic deletions from NSy to e. We therefore learned that propagation of 
the effects of an atomic addition in a causal model can be blocked by irreversible 
mechanisms. Now, we prove Theorem 1 to answer Query (1). 

Theorem 1. Consider an atomic addition add{v) in a causal model E = 
{ei, 62 , . . . , 6 m} and v € EnVars(E). There exists a non-empty set of possible 
atomic deletions D C NSy such that deleting any mechanism d € D derives the 
causal model Ey = E U add{v) \ d. 

Semantically, Theorem 1 identifies the set of manipulated systems that are self- 
contained. In other words, Theorem I assists modelers in hypothesizing a system’s 
response toward a manipulation. Furthermore, we may find the set of possible 
atomic deletions locally with respect to the order of complete structures in NSy. 
Namely, we perform IsCausalModel(Ey) checking by enumerating from the mech- 
anisms governing the manipulated variable and recursively up to those governing 
its ancestors in the causal graph until we reach irreversible mechanisms. 

Considering a completely reversible mechanical system, such as the power train 
described in Sect. 1, a manipulation usually reflects the changes of the operational 
context as in from driving uphill to driving downhill, for example. The manip- 
ulated system normally responds with instantiating different causal mechanisms 
according to the current operational context. Consequently, the mechanism be- 
ing removed is usually the one governing the exogenous variable in the system. 
However, if the mechanism being removed was governing endogenous variables, it 
means that the linkage among the set of variables is invalid in the manipulated 
system. For example, transmission or clutch between the engine and the wheels 
may be broken. Consequently, the link between engine and wheel is no longer valid. 
We therefore suggest modelers to use different enumeration orders to inspect the 
set of possible atomic deletions in different applications. When a system consists 
of irreversible mechanisms. Theorem 1 can further assists modelers in deliberating 
about the set of possible atomic deletions locally. 

Vl V2 V3 V4 W5 U6 V7 Vs 

addfvs) OOOOOOOx 

62 0 x 0 0 0 0 0 0 

63 0 0 x 0 0 0 0 0 

64 0x0xx000 

6s0xxxx000 

66 0 0 x 0xx00 

67 000x00x0 

6sx00000xx 




Fig. 2. The structure matrix and its corresponding graph after the atomic action 
{E, add{vs), delete(ei)) . 
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Example 4- Consider the set of mechanisms in Fig. 1 and its reversibility assumed 
in Example 2. The set of possible atomic deletions for manipulating variable vs, 
add{vs), is {ei, eg} according to Theorem 1. Notice that the irreversibility of mech- 
anisms allows us to find the set of possible atomic deletions in {ei, ey, eg} instead 
of iVS'^g. Moreover, = EU add{vs)\eY is not a causal model since wy cannot be 
an effect variable in es according to the reversibility of es in the knowledge base. 
However, if we choose to remove ei, delete(ei), the manipulated model is shown 
in Fig. 2. 

The dual theorem to Theorem 1 is to identify the set of possible atomic addi- 
tions given an atomic deletion, which answers Query (2). 

Theorem 2. Consider an atomic deletion delete{e) for a causal model E = 
{ei, 62 , . . . , 6 m} where e € E. Let Ge he the causal graph of E. Let e € and 

is mapped to in Ge- Let Des(N^) be the descendants of in Ge- There 
exists a nonempty set of variables A C (Des(Nf) U Nf) such that manipulating 
any variable a € A derives the causal model Ea = E L) add{a) \ e. The set of 
mechanisms Uoeyi o-dd{a) is called the set of possible atomic additions- 

Example 5. Consider the set of mechanisms in Fig. 1 and its reversibility assumed 
in Example 2. The set of possible atomic additions for removing mechanism 64 , 
delete{e 4 ), is {v 4 ,v^} according to Theorem 2. 

5 Discussion 

This paper formalizes the representation of reversibility of a mechanism to support 
modeling of changes in structure. We define the reversibility of a mechanism se- 
mantically on the set of possible effect variables. This definition allows us to extend 
the concept of reversible mechanisms from traditional mechanical and physical sys- 
tems to other systems. We further draw the analogy between the action represented 
in SEM and STRIPS languages to argue that the context and the effects of an ac- 
tion should be represented explicitly in causal modeling. Our formalization allows 
us to answer two new types of queries: (1) When manipulating a causal model, 
which mechanisms are possibly invalidated and can be removed from the model? 
(2) Which variables may be manipulated in order to invalidate and, effectively, 
remove a mechanism from a model? In practical applications, it may be desirable 
to further encode domain knowledge, such as whether a variable is manipulatable 
ethically and what is the cost of such manipulation, along with each mechanism. 
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Abstract. Although we can build a belief network starting from any or- 
dering of its variables, its structure depends heavily on the ordering being 
selected: the topology of the network, and therefore the number of con- 
ditional independence relationships that may be explicitly represented 
can vary greatly from one ordering to another. We develop an algorithm 
for learning belief networks composed of two main subprocesses: (a) an 
algorithm that estimates a causal ordering and (b) an algorithm for learn- 
ing a belief network given the previous ordering, each one working over 
different search spaces, the ordering and dag space respectively. 



1 Introduction 

Belief Networks (also called Bayesian Networks or causal networks) are 
Knowledge-Based Systems that represent uncertain knowledge by means of both 
graphical structures and numerical parameters. In a belief network, the quali- 
tative component is a directed acyclic graph (dag), where the nodes represent 
the variables in the domain, and the arrows represent dependence or causality 
relationships among the variables. The quantitative component is a collection 
of conditional probability measures, which measure our uncertainty [15]. The 
reasons for the success of belief networks are that they allow: (i) to represent 
the available information in a intelligible way (using causal relationships), (ii) 
to decompose and store the information efficiently (by means of independence 
relationships) and (iii) to perform inference tasks. 

One of the most interesting problems when dealing with belief networks is 
that of developing methods capable of learning the network directly from data. 
As learning belief networks is NP-hard [12], then any kind of previous informa- 
tion about the model to be recovered may be quite useful, in order to facilitate 
the learning process. This information may be an ordering of the variables in the 
network [2,10,13,17] or knowledge about the (possible) presence of some causal 
or (in) dependence relationships [16]. Perhaps an expert may provide this kind of 
information, but the development of tools capable of obtaining this information 
as a first step to the learning process is clearly an interesting task. 

In this work we focus on the problem of learning belief networks by first 
obtaining a good ordering on the set of variables. In general, if we look for 
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an optimal ordering then obtaining it may require as much information as the 
learning of the complete structure itself, and the calculus may be quite com- 
plex as well [6,14]. So, we propose to use only partial (and easily available) 
information about the problem in order to get a ‘good’ approximation of the 
ordering. The type of partial information we use will be a subset of the set of de- 
pendence/independence relationships that could be represented in the network 
(more precisely, marginal and conditional (in)dependence relationships of order 
one), and the method to perform the search of the ordering will be simulated 
annealing. Once we have obtained an ordering, it will be supplied to an algo- 
rithm for learning belief networks that will use the ordering to reduce the search 
space. This algorithm, called BENEDiCT-step [3], is based on a (hybrid) method- 
ology which is a combination of the methods based on independence criteria and 
the ones based on scoring metrics. 

The rest of the paper is organized as follows: in section 2 we briefly recall some 
general ideas about belief networks, and some basic concepts about simulated 
annealing. In the next two Sections we describe the components of our method: 
in Section 3 we present our algorithm to estimate an ordering and Section 4 
describes the algorithm BENEDiCT-step. Section 5 shows some experiments with 
the proposed method. Finally, Section 6 contains the concluding remarks. 



2 Preliminaries 

Given a belief network G, we can extract an ordering 9 for its variables in the 
following way: if there is an arrow Xj — >■ Xi in the graph then Xj precedes Xi 
in the ordering 9, i.e., 9{xj) < 9{xi). Such an ordering 0 is a causal ordering 
[6]. It is interesting to note that, given a dag, the causal ordering is not unique; 
for example 9\ = {xi,X2,x^,xa,x^,xq\ and 02 = {xi,X4,X2,X3,X5,xq} are two 
valid causal orderings for the first network in Figure 1. 

Given any ordering 9, the Markov condition provides a systematic (but im- 
practical) method to build a belief network [15]: for each node Xi, assign, as the 
parents of Xi in the dag, the minimal subset of predecessors of Xi in the ordering 
9 which makes Xi conditionally independent of the rest of its predecessors. 





Fig. 1. Original dag and those obtained by using orderings 02 and 03 
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However, different orderings may give rise to different networks. For ex- 
ample, let us start from the network in the left hand side of Figure 1 . 
Let 01 = {xi,a;2,a;3,a;4,a:5,X6}, 02 = {a;4, a;2, xi, X3, ccs, xe} and 0 s = 

{xq, X3, X2, X4, xijX^} be three different orderings. If we apply the previous pro- 
cess, for 01 we recover the original graph, for 02 we obtain the second graph, and 
for the ordering 0^ we get the much more dense graph on the right hand side of 
the same figure. 

After assigning the corresponding conditional probabilities to the nodes, the 
three models represent the same joint probability distribution. However, the set 
of independence relationships represented in these dags is not the same. In the 
graph associated to 0^ only a few independences are preserved, whereas using 02 
we get the same set of dependence/independence relationships as in the origi- 
nal model (the dags corresponding to 0i and 6*2 are equivalent according to [ 18 ]). 

2.1 Learning belief networks 

There are a big number of algorithms for learning belief networks from data. 
However, they can be grouped in two main approaches: methods based on con- 
ditional independence tests, and methods based on a scoring metric. 

The algorithms based on independence tests (also called constraint-based) 
carry out a qualitative study on the dependence and independence properties 
among the variables in the domain, and then they try to find a network repre- 
senting most of these properties. The number and complexity of the tests are 
critical for the efficiency and reliability of these methods. Some of the algorithms 
based on this approach can be found in [ 10 , 11 , 16 ]. 

The algorithms based on scoring metrics try to find a graph which has the 
minimum number of links that ‘best’ represents the data according to their own 
metric. They all use a function (the scoring metric) that measures the quality 
of each candidate structure and an heuristic search method to explore the space 
of possible solutions. The algorithms that use this approach when the search is 
in the space of general dags almost invariably use greedy searches. The scoring 
metrics are based on different principles, such as entropy, Bayesian approaches 
or Minimum Description Length [ 7 ]. 

2.2 Simulated Annealing 

In this section we briefly recall some basic ideas about simulated annealing, the 
search method we shall use to find a good ordering for the variables in a belief 
network. 

The idea behind simulated annealing [ 4 ] is to model numerically the physical 
annealing process of solids in order to solve optimization problems: 

Consider a system composed of N variables and a function E to optimize 
(called the energy function). Our purpose is to find a configuration c of the N 
variables that minimizes (or maximizes) the function E. Starting from a random 
configuration (cj), representing the current state, we can compute the energy 
E(ci) which measures the ‘quality’ of this configuration. A new configuration 
Cj can be obtained by applying a perturbation mechanism on c^. Let E(cj) 
be the energy of this state, and AE be the difference of energy, i.e., AE = 
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E{cj) — E{a). If the energy decreases, AE < 0, we accept Cj as the new current 
state, otherwise Cj is only accepted with a probability given by exp , being 

T the temperature, a control parameter that decreases with time. This criterion 
allows a ‘uphill climb’ from a configuration with a lower energy to another with 
higher energy, thus preventing the process from being trapped at local minima. 
The procedure continues the search until a stopping criterion is satisfied. This 
criterion may be based on considering the final temperature (close to zero), the 
value of the energy function or using a fixed number of iterations. 

3 Approximating a Causal Ordering 

We seek to find a good causal ordering for the variables in a (unknown) belief 
network. Given any ordering, it is possible to build a belief network representing 
the joint probability distribution, this network being an Independence map [15] 
of the underlying probabilistic model. However, the density of the resultant dag 
may change drastically depending on the selected ordering. Our goal is to find an 
ordering able to represent as much true independence relationships as possible. 
Given this ordering, the search space to find an optimal belief network reduces 
considerably. 

Taking into account that for a network with n nodes, the size of the set 
of candidate orderings is n!, the task of finding an optimal ordering may be 
quite complex. Several approaches to deal with this problem can be found in the 
literature: 

— Singh and Valtorta [17] use conditional independence tests to learn a draft 
of the network, which is then used to get an ordering. Next, they utilize the 
K2 algorithm [13] to learn the network. 

— Bouckaert [6] proposes an algorithm which takes as the input a complete 
dependence model and an initial ordering, and gives as the output an optimal 
causal ordering. 

— Larrahaga et al. [14] use a genetic algorithm to search for the best order- 
ing. Each element of the population is a possible ordering, and their fitness 
function is the K2 metric. 

Our approach is situated between the works of Singh and Valtorta and those 
of Larrahaga et al. The basic idea is to use only a subset of the (in)dependence 
relationships of the model to learn a draft of the network and next apply a 
combinatorial optimization tool to search for the ordering which preserves as 
much of these dependences and independences as possible. 

When dealing with conditional independence relationships whose true values 
have to be estimated from a database by means of conditional independence 
tests, two problems appear: the number of tests and their order (i.e., the num- 
ber of variables involved in the conditioning set). On one hand, the number of 
conditional independence tests that may be necessary to perform can increase 
exponentially with the number of variables; on the other hand, computing the 
truth value of a conditional independence test requires a number of calculations 
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which grows exponentially with the order of the test. Moreover, another prob- 
lem is not related with efficiency but reliability: conditional independence tests 
of high order are not reliable except if the size of the database is enormous. So, 
it may be interesting to restrict the kind of conditional independence tests that 
we are going to carry out to tests of low order. 

We propose using only conditional independence tests of order zero and one 
(i.e. I{xi,Xj\^) and I{xi,Xj\xk), respectively) for several reasons: i) these tests 
are quite reliable even for moderate datasets, ii) the number of tests is poly- 
nomial O(n^), and iii) this set of independences is quite expressive for sparse 
structures, as those we usually find in real applications. These independences 
are sufficient even for characterizing and learning some specific kinds of belief 
networks [8,9]. We shall call 0-1 Independences to the set of conditional inde- 
pendence relationships of order zero and one which are true for a given model, 
also denoted as I(^i. 

Our algorithm will take the set obtained from the data set as the in- 
put. In an initialization step, we build a undirected graph (denoted Gq-i) as 
a basic skeleton of the network: starting from the complete undirected graph, 
we remove those links Xi — Xj such that there is a 0-1 independence between 
Xi and Xj in For example, let us suppose that the underlying model is 

isomorphic to the graph I) in Figure 2. In this case the set of 0-1 Independences 
is I^i = {I{x2,x^\xi)} . The initialization step produces the undirected graph 
II) in Figure 2. In a second step we shall execute the search process, which tries 
to find an optimal ordering. For any ordering 9 being considered, we direct the 
skeleton Gq-i as follows: if Xi — xj G Gq-i, and 9{xj) < 9{xi) then we direct 
the link as xj — >■ Xi. For the example in Figure 2, let us consider the following 
orderings: 9i = {xi, X2, X3, X4}; 62 = {x2, X3, X4, Xi}; 63 = {xi, X2, X4, X3} and 
04 = {x3, xi, X2, X4}. Using these orderings we obtain the dags III), IV), V) and 
VI) respectively in Figure 2. 




Fig. 2. Different orderings of Go-i 




The Search of Causal Orderings: A Short Cut for Learning Belief Networks 221 



Now, let us describe the different components of the search process: 

• Energy function: For each configuration (ordering) 0 we try to measure 
the degree g{9) in which, after directing Gq-i according to the ordering 0 (thus 
obtaining a dag Gq_i), the dependence and independence relationships in I^i 
are preserved in the dag Gq_i- Let us denote to the set of independence 
relationships of order zero and one that are valid in Gq_i (using d-separation) 
and (., ,|.) to any d-separation statement in a dag. So, we count the number of 
dependence and independence relationships that are true in I^i but are not in 
Iq_i- Therefore, our energy function is: 

9(9) = E (L(x^,Xj|0) (K) {(r j , Xj 1 0) ) T ^ ) (d(xj, Xj|x/„) (K) (x^, Xj|x/,;)) (1) 

Xi,Xjf\Xi^Xj Xi^Xj ,Xk^\Xi^Xj^Xk 

where we assume that an independence relationship takes on a binary value 
(1 for dependence, 0 for independence) and (8) corresponds to the exclusive-or 
operator^. A value g{9) = 0 represents that and I§_i are equivalent, and the 
greater the value of g{9) is, the greater number of dependence and independence 
relationships are not preserved. Obviously, we shall prefer those orderings giving 
a value of as low as possible. For the example in Figure 2, we have 5(^1) = 0, 
(7(6*2) = 2 , 5(6*3) = 1 and 5(^4) = 0, thus 9 i and 6*4 are the preferred orderings. 

• Perturbation mechanism: Each configuration representing an ordering 9 is 
codified as a chain of variables, when Xj appears before Xi then Xj precedes 
Xi in 9. Given a configuration, the new configuration is obtained by modifying 
a randomly selected segment s in the current configuration. Two mechanisms 
(randomly selected with 0.5 probability) have been implemented. The first one, 
a transportation function that moves the segment toward a new random position 
p (interchanging the elements); the second one is the inverse function that inverts 
the ordering of the variables within the segment. 

• Temperature function: A proportional decreasing function has been imple- 
mented, i.e., Tk = aTk-i, where a G (0, 1) and Tg is a fixed initial temperature. 

• Stopping criterion: The algorithm stops when: i) all the 0-1 independences 
have been captured by the current configuration 9, ii) the fitness is not modified 
after two consecutive iterations or iii) the process has been iterated 10 times. 

4 Learning Belief Networks with a Given Ordering 

The algorithm we are going to describe, BENEDiCT-step, utilizes a hybrid method- 
ology: it uses a specific metric and a search procedure (so, it belongs to the group 
of methods based on scoring metrics) , although it also explicitly makes use of the 
conditional independences embodied in the topology of the network to elaborate 
the scoring metric and carries out independence tests to limit the search effort 

^ We also tried more quantitative ways of evaluating the goodness of the ordering. The 
idea is that a link may actually represent a very weak correlation, so its absence may 
not be so important as the absence of other links representing strong correlations. 
However, the best results were obtained by using the qualitative measure of eq.(l). 
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(hence it has also strong similarities with the algorithms based on independence 
tests). It is part of a family of algorithms [2,3] that share a common methodology 
for learning belief networks, which we have called BENEDICT. 

Let us briefly describe the benedict methodology. The basic idea is to 
measure the discrepancies between the conditional independences (d-separation 
statements) represented in any given candidate network G and the ones displayed 
by the database D. The lesser these discrepancies are, the better the network 
fits the data. The aggregation of all these (local) discrepancies will result in a 
measure g{G,D) of global discrepancy between the network and the database. 

To measure the discrepancy of each one of the independences in the graphical 
model and the numerical model (the database), benedict uses the Kullback- 
Leibler cross entropy: 

Dep{X, y I Z) = P(x, y, z) log - 

where x, y, z denote instantiations of the sets of variables X, Y and Z respec- 
tively, and P is a probability estimated from the database. 

As the number and complexity of the d-separation statements in a dag G may 
grow exponentially with the number of nodes, we cannot use all the d-separations 
displayed by G, but some selected subset of ‘representative’ d-separation state- 
ments. Given any candidate network G, benedict will take into account the 
conditional independencies for every two non-adjacent single variables, Xi and 
Xj given the set of minimum size, Sa{xi,Xj), that d-separates Xi and xj [1]. 
Finding this set takes some additional effort, but it is compensated by a de- 
creasing computing time of the corresponding dependence degree. Moreover, it 
also increases the reliability of the results, because less data is needed to re- 
liably compute a conditional dependence measure of lower order. The method 
BENEDICT uses for efficiently finding the sets Sa{xi,Xj) is described in [1]. 

In order to give a score to a specific network structure G given a database 
D, BENEDICT uses the aggregation (the sum) of the focal discrepancies, as the 
measure of global discrepancy g{G, D) (which has to be minimized). Finally, the 
type of search method used by benedict is a simple greedy search that allows to 
insert into the structure the candidate arc that produces a greater improvement 
of the score (removal of arcs is not permitted). 

Let us describe more specifically the algorithm BENEDiCT-step. It works under 
the assumption that the total ordering of the variables is known (this ordering 
9 is just the one obtained by the simulated annealing algorithm), benedict- 
step consists in a process composed of n steps, where each step i represents the 
inclusion of a new node Xi (the i-th node in the ordering 9) in the (initially 
empty) structure and the inclusion of the necessary arcs to construct the best 
graph with i nodes, Gi. 

At each step i only the d-separation relationships between Xi, the node just 
introduced, and the previous ones are considered, hence the metric used by 
BENEDICT-step is 

g{G„D)= ^ Dep{xi,Xj\SGi{x^,Xj)) 

Xj , Xj <eXiAxj (xi) 



(2) 
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Every step i is composed of a series of substeps. Each substep looks for the 
arc whose addition to the current graph results in a greater decrease of the 
discrepancy measure between the new graph (that one with the added arc) and 
the data. The process continues in this way, adding at each substep the single 
arc Xj — >■ Xj, Xj <g Xi, which most decreases the discrepancy, until the stopping 
condition holds. 

This condition is related with the fact that the algorithm also uses indepen- 
dence tests to remove candidate arcs; in this way, the process stops naturally 
when there is no more candidate arcs to consider (either because they are al- 
ready inserted into the structure or because their extreme nodes are found to 
be independent). At the end of the algorithm a pruning process (also based 
on independence tests) is triggered (see [3] for details). This pruning partially 
overcomes some of the problems due to the use of an irrevocable search strategy. 

5 Experimental Results 

We will consider the performance of the proposed methodology to recover the 
so-called Alarm belief network (see Figure 3), which has been considered as a 
benchmark for evaluating learning algorithms. This network contains 37 variables 
and 46 arcs. The input data commonly used are subsets of the Alarm database 
which contains 20,000 cases, specifically we used the first 3,000 cases in our 
experiments. 




Fig. 3. The Alarm network 

As we explained, the learning process is divided in two main processes. Let 
us first analyze the results of each one separately, and then the final results of 
the whole process. 

The first subprocess consists on searching the ‘best’ ordering, 6, of the vari- 
ables using a simulated annealing algorithm. The fitness value used was the 
number (in percentage) of 0-1 (in)dependences preserved by the current order- 
ing. In order to measure the quality of the ordering we will compare with the 
Alarm ‘correct’ ordering. Due to the stochastic nature of the simulated anneal- 
ing algorithm, we run several times the algorithm with the same training set. 
In every case the final fitness was 97.0%, resulting different indistinguishable 
orderings (one of these orderings is the ’correct’ ordering). After analyzing the 
orderings obtained in our experiments we can extract some conclusions: 
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1. The degree in which the database reflects the set of 0-1 (in) dependences in 
the true model is important for getting a good output ordering. 

2. For those subsets W of variables in the model with no 0-1 independence 
relationships, we do not have enough information to discriminate which par- 
tial orderings, involving the variables in W, are the correct ones. We found 
different orderings, involving changes in the relative ordering of variables in 
the set {35, 15, 34}, with the same fitness value. As we will see, the study of 
these orderings will be relevant in order to get a better network. 

3. There are several orderings with the same fitness value that are indistinguish- 
able even with more information. For example, considering the variables 4 
and 19, regardless of the relative ordering used, we get equivalent structures. 

The second subprocess consists on, using an ordering d, let the algorithm 
BENEDiCT-step to learn the structure of the network. In order to analyze its 
behaviour, we supply the BENEDiCT-step algorithm with the ‘correct’ ordering 
and the same training data. From the topology of the learned network we observe: 
a) There are three missing edges, 11 — >■ 27, 12 — >■ 32 and 21 — >■ 31. The last 
two arcs are not strongly supported by the data, as it was reported by several 
authors, b) There is one extra arc between variables 31 and 27 which has not 
been determined as independent by the independence tests; this arc is set while 
trying to compensate the loss of the arc 11 — >■ 27 (a total of 4 different arcs from 
the original model). We have also computed several other measures to evaluate 
the quality of the learned network from different points of view. These measures 
are: 1.- the Kullback^ distance between the probability distribution associated to 
the database and the probability distribution associated to the learned network. 
2.- The K2 metric [13] (log version) and 3.- the BIC metric (Bayesian Information 
Criterion) which includes a penalty term. Finally we compare all these collected 
measures with those obtained by the K2 algorithm [13], running both algorithms 
on the same conditions. The results shown in the first row of Table 1 allow us 
to conclude that our algorithm is competitive and recovers a good model. 

Now we are going to analyze the results obtained in the whole process. Usu- 
ally when no ordering is known, the learning algorithm has to cope with the 
entire dag space to learn the network. We can make a comparison^ between the 
two steps method proposed and the single searching process. For that purpose we 
have used a constraint based algorithm, BN Power Constructor (BNPC) [11] (we 
use the software package available at http://www.cs.ualberta.ca/~jcheng 
/bnsoft.htm ). In Table 2 we show the results obtained by the BNPC algo- 
rithm which are worse than those obtained by any of the different orderings ff, 
used as entry to the algorithms BENEDiCT-step and K2. All these orderings were 
score-equivalent for the simulated annealing process. 

^ Actually, we have calculated a decreasing monotonic transformation of the Kullback 
distance computed in a very efficient form [8]. The interpretation is: the higher this 
parameter the better is the network. 

® We do not compare the running times because the three algorithms considered, run 
on different platforms and are implemented using different programming languages. 
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In order to focus on how some local misplacements in the ordering can modify 
the resulting learned network, we have studied the possible cases found by sim- 
ulated annealing involving the relative orderings between 34, 35 and 15. As we 
said, simulated annealing is not capable to discriminate among these orderings, 
thus any configuration would be a possible input to the BENEDiCT-step algorithm. 
As we could expect, they give rise to different networks. Table 1 presents the 
results obtained when we consider three orderings that differ only in the relative 
ordering of the variables 34,35 and 15. As we can observe, some of the order- 
ings are worse than others, and the output belief network structure depends on 
how lucky we were, in the first subprocess. As the ordering is obtained by using 
only partial information (0-1 Independences), its quality might be questioned. 
Hopefully, we do not have to reconsider the global ordering. In the general case, 
the set of 0-1 Independences is quite significant, so the learned structure is a 
‘good’ representation of the model. Anyway, we thought about that benedict- 
step with more information could detect some misplacements between variables 
in 9 and could discriminate among indistinguishable orderings. Note that a ‘bad’ 
ordering (as for example the order 63 in Figure 1) tends to create cliques (we 
consider cliques having at least three nodes) among the variables involved. 



Table 1. Comparison between the algorithms BENEDICT-step and K2 



Ordering 


BENEDICT-step 


1 ^2 1 


Relative 

ordering 


Kullback 


Hamming 


BIC 


K2 


Kullback 


Hamming 


BIC 


K2 


’correct’ 


9.20 


4 


-33919 


-14425 


9.23 


2 


-34351 


-14424 




8i 


9.11 


14 


-34830 


-14624 


9.21 


12 


-35697 


-14520 


34^15^35 


^2 


9.18 


12 


-34606 


-14533 


9.23 


8 


-36205 


-14494 


34^35^15 


^3 


9.21 


8 


-34358 


-14479 


9.23 


5 


-35203 


-14450 


35^34^15 



Table 2. Performance measures for the network learned by BNPC 



Kullback 


Hamming 


BIC 


K2 


9.12 


7 


-35197 


-14541 



We have developed a heuristic rule that, using the information stored in the 
output network, allows us to obtain a sparser representation of the same model. 
This refinement is carried out by determining local rearrangements in 9, giving 
rise to also local changes in the structure, but improving the quality of the 
output network. Basically the heuristics consists on selecting a variable Xi in a 
clique and generating a new ordering 9*, where this variable changes its relative 
position with respect to some variables in the clique. 

In Table 1, from the ordering 6*1 to 9^, we can follow the steps of our heuristic 
focused on variables 34,35 and 15. We make the comparisons taking as refer- 
ence the structure obtained by BENEDiCT-step when it uses the Alarm ‘correct’ 
ordering as the input (the initial erroneous arcs remain) . Thus, taking the worst 
ordering, 34 ^ 15 A 35, as the input, our first change involves the variables 15 
and 35 (the last two variables in the clique), giving rise to a sparser network 
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and also with a better fitness, which is accepted as the current one. Then, using 
the same reasoning, the change between variables 35 and 34 is performed. Note 
that the resulting ordering is the correct relative ordering. The algorithm stops 
at this point. 

6 Concluding Remarks 

We have addressed the problem of learning belief networks from data by means 
of a two steps process: estimating a good ordering of the variables (thus reduc- 
ing the search space for the belief network learning algorithm) and then using 
a learning algorithm that exploits this ordering. The search for a good ordering 
is carried out by means of a simulated annealing method, which uses a func- 
tion based on independence tests of order zero and one to measure the fitness. 
The algorithm that uses this ordering to learn the network is a member of the 
BENEDICT family, whose main characteristics are its hybrid nature and the use 
of d-separating sets of minimum size. 

In addition to the specific algorithms that we use to develop our method, 
the main methodological difference with respect to other approaches [14,17] is 
that the two subprocesses are run independently on each other (this means 
that we carry out two ‘simple’ different search processes (over different spaces) 
instead of a single search process which intermingles the orderings and the graph 
structures) . 

In general, to obtain an optimal solution to the problem of finding a causal 
ordering, it would be necessary to learn the network (i.e., to have information 
about the complete set of valid conditional independence statements). Our ex- 
periments show that our approximate method (based on conditional indepen- 
dence tests of low order, for reasons of reliability, expressiveness and efficiency) 
is quite successful. Its combination with BENEDiCT-step gives a very general and 
competitive algorithm for learning belief networks. However, a thorough exper- 
imental work, using networks of different complexity, is necessary in order to 
obtain definitive conclusions. 

In future works we plan to continue the development of heuristic rules 
allowing the algorithm BENEDiCT-step (or any other learning algorithm that 
requires an ordering) to make local rearrangements in the ordering to improve 
the quality of the learned network. We will also study methods that refine a 
given ordering (e.g., the output ordering provided by our algorithms) to obtain 
an optimal solution. These methods could be based on the idea of the reliability 
about the particular position of any given variable Xi in the ordering, which in 
turn is directly related to the number of 0-1 independences where this variable 
Xi is involved. 
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Abstract. An important type of methods for learning belief networks 
from data are those based on the use of a scoring metric, to evaluate the 
fitness of any given candidate network to the data base, and a search 
procedure to explore the set of candidate networks. In this paper we 
propose a new method that carries out the search not in the space of 
directed acyclic graphs but in the space of the orderings of the variables 
that compose the graphs. Moreover, we use a new stochastic search 
method to be applied to this problem. Variable Neighborhood Search. 
We also experimentally compare our methods with some other search 
procedures commonly used in the literature. 

Keywords: Belief Networks, Gausal Orderings, Learning, Variable 
Neighborhood Search, Stochastic Hill-Climbing Search. 



1 Introduction 

Belief Networks (BNs), also known as Bayesian Networks or Causal Networks, 
are knowledge representation tools able to efficiently manage the dependence 
and independence relationships among the random variables that compose the 
problem domain we want to model. This representation has two components: 
a) a graphical structure, more precisely a directed acyclic graph (dag), and b) 
a set of parameters, which together specify a joint probability distribution over 
the random variables [20] . In belief networks, the graphical structure represents 
dependence and independence relationships. The numerical component is a col- 
lection of conditional probability measures, which shape the relationships. 

Once we have the belief network specified, it constitutes an efficient device 
to perform inference tasks. However, there still remains the previous problem of 
building such a network. So, an interesting task is to develop automatic meth- 
ods capable of learning the network directly from data, as an alternative or a 
complement to the method of eliciting opinions from experts. 
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Nowadays, the problem of learning or estimating a belief network from data 
is receiving increasing attention within the community of researchers into uncer- 
tainty in artificial intelligence. Algorithms for learning (the structure of) BNs 
have been studied, basically from two points of view: Methods based on condi- 
tional independence tests [5,6,7,22,23] and methods based on a scoring metric 
optimization [12,16,17]. This classification is not exhaustive and/or strict, there 
also exist algorithms that use a combination of these two methods [1,2,13,21]. 
In this paper we only consider learning methods based on a scoring metric. 

As learning belief networks is, in general, a NP-Hard problem [11], we have to 
solve it with heuristic methods. Most existing scoring-based learning algorithms 
apply standard heuristic search techniques, such as greedy hill-climbing, simu- 
lated annealing (local search), genetic algorithms, etc. In this paper we focus 
on local search methods, more precisely stochastic hill-climbing methods. These 
methods examine only possible local changes at each step, and apply the one 
that leads to the greatest improvement in the scoring function. When the search 
process is carried out in the space of dags, the usual choices for local changes 
are arc addition, arc deletion and arc reversal. Thus, there are 0{n?) possible 
changes, where n is the number of variables. 

However, several authors [18,14,8] have shown that the space of orderings 
of the variables is much ‘smoother’ than the space of dags. Moreover, it is also 
known that, by providing a good ordering of the variables, the learning algo- 
rithms become more efficient and accurate. In fact, there is a number of algo- 
rithms that need to use such an ordering [1,2,7,10,12]. Therefore, our proposal 
is to develop learning methods that carry out the search process in the space of 
the orderings instead of the space of dags. 

The search method that we are going to adapt to our problem is, in ad- 
dition to classical hill-climbing, the recently developed Variable Neighborhood 
Search (VNS) [15,19], which is a metaheuristic that uses a systematic change of 
neighborhood within a randomized local search algorithm. 

The paper is structured as follows: we begin in Section 2 with the prelim- 
inaries. In Section 3 we formalize our proposal of learning belief networks by 
searching in the space of the orderings: we define our search space, the admis- 
ible local changes to move within this space and how to efficiently carry out 
the evaluation of the different orderings. In Section 4 we introduce the Variable 
Neighborhood Search. In section 5 we propose two learning algorithms based on 
orderings: one uses a hill-climbing search and the other uses VNS. In Section 6 we 
present the experimental evaluation. Finally, Section 7 contains the concluding 
remarks. 

2 Preliminaries 

In this section we briefly review BNs and how to learn them. A BN is a directed 
acyclic graph G = (V,E), where V, a set of nodes, represents the system vari- 
ables and E, a set of arcs, represents the dependence relationships among the 
variables. A set of parameters is also stored for each variable in V, usually con- 
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ditional probability distributions. For each variable Xi G V we have a family of 
conditional distributions P{xi\Pao{xi)), where Pao{xi) represents the parent 
set of the variable Xi. From these conditional distributions we can recover the 
joint distribution over V : 



P{XI,X2, . . . , x„) = P{xi\PaG{x^)) (1) 

i=l 

This expression represents a decomposition of the joint distribution. The 
dependence /independence relationships which make possible this decomposition 
are graphically encoded (through the d-separation criterion [20]) by means of 
the presence or absence of direct connections between pairs of variables. 

The problem of learning a BN can be stated as follows: given a training set 
D = {v^, . . . , V™} of instances of V, find the BN that best matches D. The 
common approach to this problem is to introduce a scoring function, /, that 
evaluates each network with respect to the training data, and then to search for 
the best network according to this score. Different Bayesian and non-Bayesian 
scoring metrics can be used [1,4,12,16,17]. 

A desirable and important property of a metric is its decomposability in 
presence of full data, i.e, the scoring function can be decomposed in the following 
way: 

n 

f{G ■ ^) = E : N,,,Paai.,)) (2) 

where are the statistics of the variable Xi and Paoixi) in D, i.e., 

the number of instances in D that match each possible instantiation of Xi and 
Pacixi). The decomposition of the metric is very important for the learning 
task: a local search procedure that changes one arc at each move can efficiently 
evaluate the improvement obtained by this change. Such a procedure can reuse 
the computations made in previous stages. An example is a greedy hill-climbing 
method that at each step performs the local change that yields the maximal gain, 
until it reaches a local maximum. As this procedure is trapped in the first local 
maximum it reaches, several methods for avoiding this situation have been used, 
such as stochastic hill-climbing, simulated annealing, tabu search, etc. The main 
representative of stochastic hill-climbing is hill-climbing with random restart, 
which has been used by several authors with relative success (see [16] for more 
details). This fact has motivated us to try a new search method based on the 
same principles that the previous one, but with a systematic and reasonable 
random search in a larger neighborhood at each step if the current local search 
does not improve the best current maximum. This method, VNS [19], has been 
applied to solve optimization problems with successful results. 

For a dag G, given a causal ordering 6 (i.e., an ordering compatible with the 
topology of the dag^), the following independence relationships are true: Xi is 
conditionally independent of all the variables that precede it in the ordering. 



^ if there is an arc Xi xj, then 6(xi) < d(xj). 
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given its parent set Pacixi), for all Xi. This fact provides a systematic method 
to build belief networks: for each node Xi, the parents of Xi in the dag are 
the minimal subset of predecessors of Xi (in the ordering 6) which makes Xi 
conditionally independent of the rest of its predecessors. 

However, different orderings may produce different networks. We would prefer 
those networks that are able to represent as much true independence relation- 
ships as possible (i.e., having as few arcs as possible). For that reason it makes 
sense to search for the best ordering. 

3 Searching in the Space of the Orderings 

Let us assume that we want to find a belief network for a problem having n 
variables, V= {a;i, X 2 , . . . , x„}, and we have a database of cases D. Although we 
are looking for a network, we are going to perform a main search process in the 
space of the orderings for the variables in V, and this search will be guided by 
a scoring function, that evaluates the network obtained from the given ordering 
by means of a secondary search process (in the space of the dags compatible 
with this ordering). 

So, our search space is the set of n! orderings, 9, of the variables in V (i.e., 
the set of permutations of n elements). Now, we have to define the operator to 
move from one configuration to another neighboring configuration in this space. 
We propose to use the interchange between two positions i and j in the sequence 
defining an ordering. More precisely, if 9 is the current configuration, then the 
n{n — l)/2 neighboring configurations of 9 are those orderings 9ij, where i < j, 
defined as follows: 

Let Xu and Xu such that 9{xu) = i and 9{xy) = j. Then 

{ ^(^fc) If Xu and Xfc yf Xy 
j if Xk = Xu 

i if Xfe = Xy 

Now, we have to decide how to evaluate the quality of an ordering 9. Our 
proposal is to use a scoring metric, /, defined for dags and to perform a search 
process in the space of dags compatible with 9. The scoring value of the obtained 
dag Gg, f{Gs : D), will be the value of 9 (f{9 : D) = f{Gg : D)). For example, we 
can use a (deterministic) hill-climbing algorithm with operators of arc addition 
and arc removal (arc reversal has no sense because the ordering is fixed). In 
other words, we have to find the best parent set of each variable Xk among the 
variables that precede Xk in the ordering 9. The search of the parent set of a 
variable Xk can be done independently of the parent sets of the other variables. 

However, if the metric / being used is decomposable, we should try to take 
advantage of this fact to reduce the complexity of evaluating an ordering 9ij, by 
using as much information about the evaluation of 9 as possible. As 9 has been 
already evaluated, we know that 

n 

fi^ ■ D) ^'^f{xk\PaGei.Xk) : N^^,PacgG^)) 
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Therefore, for the nodes Xk such that 9{xk) < i or 9{xk) > j, the set of prede- 
cessors of Xk will be the same for 9 and for 9ij, so that we can be sure that 

PaGe..{xk) = Paogixk) 

and we do not need to calculate f{xk\PaGg. X^k) ■ (xk))- 

For the nodes Xk such that i < 9{xk) < j, their sets of predecessors change, 
so that we are forced to search again for their parent sets. For each one of these 
nodes Xk, we start from an empty parent set and, by applying the operators of 
arc addition and arc removal, perform a hill-climbing search. 

With the aim of improving the efficiency of the search process, we are going 
to restrict, by means of a parameter, r (radius), the number of neighboring 
configurations. The only admissible neighbors of a given ordering 9 are those 
orderings 9ij such that the ‘distance’ between the variables to be interchanged is 
not greater than r, i.e., \j — i\ < r. This will allow to speed up the search process, 
because each configuration has less neighbors (exactly r{n— (r-l- l)/2)), and the 
discarded neighbors are precisely the ones whose evaluation is more complex. A 
radius r = n — 1 is equivalent to no restriction. If the starting point of the search 
process is a good ordering, we believe that a drastic change in this ordering (i.e., 
to interchange two very distant nodes) is not expected to produce an important 
improvement in the score. In any case, an interchange betwen two distant nodes 
Xu and Xy can also be obtained by performing successive interchanges involving 
Xu, Xy and some intermediate nodes (for example, an interchange of lenght \j — i\ 
may be obtained by means of three interchanges of lenght \j — *|/2). 

4 Variable Neighborhood Search 

In this section we review the rules of the basic VNS and apply them for learning 
belief networks. 

Let us denote a finite set of pre-selected neighborhood structures with Afk 
(fc = 1, . . . , kmax), and let Afk{x) be the set of solutions in the neighborhood 
of X (heuristic local search usually uses one neighborhood structure, i.e., kmax = 
1). The basic VNS heuristic comprises the following steps: 

Initialization. Select the set of neighborhood structures Mk, k = 1, , kmax, 
that will be used in the search; find an initial solution x; choose a stopping 
criterion. 

Repeat the following until the stopping criterion is met: 

(a) Set k = 1-, Until k = kmax, repeat the following steps: 

(a.l) Shaking. Generate a solution x' at random from the neighborhood 
of X {x' G Afk(x)). 

(a. 2) Local search. Apply some local search method with x' as the initial 
solution; denote with x” the solution obtained as local optimum. 

(a. 3) Move or not. If this local optimum is better than the incumbent, 
move there {x •«— x”), £^nd continue the search with M\{k = 1); 
otherwise, set A: = fc -I- 1. 
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The stopping condition may be based, for example, on maximum running 
time, maximum number of iterations (of step (a)), or maximum number of iter- 
ations (of step (a)) between two improvements. Note that point x' is generated 
at random in order to avoid cycling, which might occur if any deterministic rule 
were used. 

Once we have an appropriate local search method for an optimization prob- 
lem, it is easy to program steps (a.l) and (a. 3) of the basic VNS. For example, 
if A/fc is obtained by ^-interchanges of solution attributes (as will be our case) , 
only a few lines have to be added to an existing code for a local search method. 

The basic VNS is a descent (ascent) first improvement method. Without 
much additional effort it could be transformed into a descent-ascent method 
(in step (a.3) set also x ^ x” with some probability even if the solution is 
worse than the incumbent). Of course, this variant is reminiscent of simulated 
annealing. Other variants of the basic VNS include: 

— Introduce kmin and kgtep, two parameters that control the movement be- 
tween neighborhood structures, i.e., in the previous algorithm, instead of 
A: = 1, set fc = kmin and instead of fc = A: -I- 1, set k = k + kstep- These 
parameters guide the intensification and diversification of the search. 

— Remove the local search. This variant, which is denoted as Reduced VNS, 
is useful for very large problems for which local search is costly. This works 
in similar way to the Monte-Carlo method but in a more systematic way. 
Its relationship with the Monte-Carlo method is the same as that of VNS to 
multi-start methods. 

When using more than one neighborhood structure in the search, as it is 
done in VNS, the following problem specific questions have to be answered: 

— What A/fc should be used and how many of them? 

— What should be their order in the search? 

— What search strategy should be used in changing neighborhoods? 

Furthermore, we must decide what local search routine will be used in the 
local search step. 

5 Learning Algorithms Based on Orderings 

We have a database D = {v^ , , v™}, containing m instances of V. We assume 
a given decomposable scoring metric f{G : D) for dags. Let 6>„ be the set of 
all the orderings of n elements and Qg be the family of all dags G whose set of 
vertices is V and whose arcs are compatible with the ordering 9. The problem 
considered is then: 



Find 9* 



arg max f{9 : D) 



( 3 ) 



f {9 -. D) = f{Gg ■. D) = ma^ f{G ■. D) 



where 



( 4 ) 
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Thus, we first find the best dag, Gg (according to the selected metric /), com- 
patible with an ordering 0, and next select the ordering 9* that has produced the 
best dag. The dag Gg* is the desired solution of our learning problem. We tackle 
this problem from a heuristic point of view: the two optimization processes are 
solved using search methods. 

The (approximate) solution to the problem in Equation (4) will be obtained 
by using the search process of dags described in Section 3 (i.e., a hill-climbing 
search of the best parent set of each node in V). 

To solve the problem in Equation (3), we propose two alternatives. The first 
one is to also use a hill-climbing search in the space of the orderings (with 
the operator of interchange of two positions described in Section 3, and a fixed 
radius). We call this algorithm HCSO (MU- Climbing Sfearch based on Clrderings). 

The second alternative is to use a VNS. This algorithm will be called VNSO 
( Variable Neighborhood S'earch based on Clrderings) . To do this, we need to 
define the neighborhood structures Afk ■ Mi will be the neighborhood defined by 
the operator of interchange of any two positions, i.e., 

9' G Mi{9) 4=^ 9'{xu) = 9{xy), 9'{xy) = 9{xu) and 9'{xk) = 9{xk) 'ixk ^ Xu,Xy 
M 2 will be defined by the interchange of two pairs of positions, i.e., 

9" G M2{9) ^ 9" G Mi{9') and 9' G Mi{9) 

Similarly, 

9" G Mk{9) 9" G Mi{9') and 9' G Mk-i{9) 

The search strategy between neighborhoods that we are going to use is the one 
used in the basic VNS {k = 1 and at each step k = k+1). As stopping criterion 
we use the maximum number of iterations between two improvements, together 
with a maximum number of total iterations. 

The local search chosen for the step (a.2) of VNSO is just HCSO. However, 
instead of using HCSO with a fixed radius r and, in accordance with the search 
strategy used by VNS, we propose an updating scheme for this parameter: when 
we move from Mk to a greater neighborhood Mk+i, we also increase the radius 
(from r to r -|- 1), and when we move to M\, the radius is set to its initial value. 

6 Experimental Evaluation of the Algorithms 

In order to test the behavior of the methods proposed in the paper, we have 
selected the ALARM network [3]. This network has 37 nodes and 46 arcs and is 
used for diagnosis in a medical domain. It has been considered as a benchmark 
for evaluating learning algorithms. All the experiments have been carried out on 
the first 10000 cases of the ALARM database (and comparing the results with 
the true ALARM network). The scoring function used in all the experiments is 
the K2 metric [12]. 

We have run the HCSO with two different radii: r = 36 (the maximum radius 
in this case) and r = 7. For VNSO, we have used kmax = 7 and the initial radius 
is r = 7. We have also used three different options to obtain the initial solution: 
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— S-PC: Learning a network using the PC algorithm [22] and extracting a 
topological ordering. 

— S-0: To start from an empty dag and an arbitrary ordering (in our case we 
used the ordering of the variables in the database). 

— S-K2SN: To initialize the search with the result of the algorithm K2SN [9]. 
This algorithm is an extension of the algorithm K2 that does not require a 
given ordering: Starting from an empty graph, K2SN iteratively determines 
the best node to add, until all the nodes have been included in the graph. At 
each step, the best parent set for each node not previously introduced in the 
structure is selected (among the nodes already included in the graph, as the 
K2 algorithm does) and the node producing the best score is added to the 
graph, linking it to its corresponding parent set. In this way K2SN returns 
an ordering and a graph compatible with this ordering. 

As stopping criterion, we have chosen a maximal number of two iterations 
without improvement, combined with a maximal number of three total iterations. 

In order to compare our algorithms with the classical local search methods, 
we also use the classical hill-climbing in the space of dags (HOST), with operators 
of arc addition, arc removal and arc reversal, and the same three initialization 
methods. 



6.1 Experimental Results 

The information we have collected from each experiment is the following: the 
value of the K2 metric (log) of the best individual evaluated; the number of 
arcs added (A), deleted (D) and inverted (I), compared with the true ALARM 
network; the number of iterations carried out by the algorithm, i.e., the total 
number of hill-climbing searches carried out (nS); the total number of individu- 
als evaluated during the search (nE) ; the total number of statistics (local scores) 
used (tS); the total number of different statistics calculated (tSC) (using a hash- 
ing method, we do not need to recalculate a local score already computed); the 
mean number of variables involved in the statistics (mV); finally, we also display 
the value (KL) of the best individual evaluated, which is defined as follows: 



n 

KL{G : D) = Dep{xi, Pa{xi)) 

i = 1 

Pa{xi) yf 0 



(5) 



where: 



Dep{X,Y) 









( 6 ) 



Note that KL{G : D) is a decreasing monotonic transformation of the Kull- 
back distance between the probability distribution associated to the database 
and the probability distribution associated to the network G [5] (we use this 
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transformation because it can be calculated very efficiently, whereas the compu- 
tation of the Kullback distance has an exponential complexity) . The interpreta- 
tion of KL{G : D) is: the higher this parameter the better is the network. 

So, we have collected five measures of the quality of the learned networks (K2, 
KL, A, D and I) and five measures of the complexity of the search methods (nS, 
uE, tS, tSC and mV). nE represents the number of dags evaluated in the case of 
HOST, and the number of orderings evaluated for HCSO and VNSO. tSC is an 
interesting measure because computing a new local score (not previously stored) 
requires accessing to the database and it can be a time-consuming process. The 
complexity of the calculus of these local scores increases exponentially with the 
value of mV. Although the cost of accessing to the value of a stored local score 
is much smaller, it is also interesting to know the value tS, because all these 
local scores have been actually used to compute the (global) scoring values. The 
measures nE, tS and tSC do not include the cost of the initialization step in the 
S-PC case (which is quite high). 

The results of the experiments are displayed in Tables 1, 2 and 3. For VNSO, 
the experiments have been carried out ten times. Table 2 displays the average 
value /i and the standard desviation a of each item. 



Table 1. Results for HCSO. 



Radius = 36 


1 Radius = 7 




Empty 


K2SN 


PC 




Empty 


K2SN 


PC 


K2 


-47080.22 


-47076.20 


-47109.89 


K2 


-47513.60 


-47079.60 


-47117.67 


KL 


9.2740 


9.2740 


9.2655 


KL 


9.2687 


9.2762 


9.2639 


A 


1 


1 


1 


A 


23 


4 


2 


D 


1 


1 


2 


D 


3 


1 


3 


I 


0 


0 


0 


I 


11 


0 


0 


nS 


1 


1 


1 


nS 


1 


1 


1 


nE 


17316 


3330 


3996 


nE 


7280 


1300 


3120 


tsc 


51609 


20160 


26263 


tsc 


18333 


11384 


17266 


ts 


1.39E7 


2.52E6 


2.96E6 


ts 


2.88E6 


4.27E5 


9.73E5 


mV 


4.83 


4.35 


4.39 


mV 


4.67 


4.15 


4.36 



The best result found by the search algorithms is a network with a value 
of the K2 metric equal to -47076.20 (VNSO-S-0, VNSO-S-K2SN and HCSO-S- 
K2SN are all able to obtain this network). Note that the K2 and KL values of 
the true ALARM network for the database being used are equal to -47086.57 
and 9.2744, respectively. 

First, we have to note that the initialization method used is quite relevant 
from the point of view of the quality of the obtained result (K2, A, D, and I 
values) and the efficiency of the search process (nS, nE, tSC and tS values) for 
all the methods (Table 4 displays the K2 and KL values for the different initial 
networks). The best initialization is always produced by the K2SN algorithm 
(particularly, a simple HCSO initialized with K2SN produces the best result). 
This is not surprising for VNSO and HCSO because K2SN is explicitly designed 
to work with orderings, but is somewhat surprising for HOST. Another interest- 
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ing result is that the more informed initialization S-PC is not better than the 
‘vacuous’ initialization S-0 when searching in the space of the orderings (the ex- 
ception is HCSO with a small radius) . It seems to us that PC directs the process 
towards a suboptimal local maximum, which is difficult to offset by the search 
process. 



Table 2. Results for VNSO. 





Empty 


K2SN 


PC 






a 




(7 


ft 


(7 


K2 


-47087.39 


12.26 


-47076.49 


0.90 


-47110.08 


0.41 


KL 


9.2731 


0,004 


9.2740 


0.000 


9.2655 


0.000 


A 


4.5 


3.03 


1.8 


0.42 


1.9 


0.32 


D 


1.6 


0.52 


1.0 


0.0 


2.0 


0.0 


I 


0.8 


1.03 


0.0 


0.0 


0.0 


0.0 


nS 


181.0 


99.3 


96.1 


29.02 


68.6 


17.3 


nE 


327335 204091 


179865 


55924 


99061 


30844 


tsc 


68087 


11680 


49902 


5664 


54670 


7832 


ts 


5.98E7 2.73E7 


4.16E7 1.12E7 


2.43E7 6.53E6 


mV 


4,96 


0,06 


4.79 


0,08 


4.85 


0.1 



The results obtained support the conclusion that searching in the space of 
the orderings is a good idea: Both VNSO and HCSO outperform HOST in all 
the cases (except HCSO-S-0^). 



Table 3. Results for HCST. 





Empty 


K2SN 


PC 


K2 


-47267.11 


-47081.66 


-47133.41 


KL 


9.2657 


9.2761 


9.2708 


A 


11 


3 


3 


D 


4 


1 


1 


I 


6 


0 


2 


nS 


1 


1 


1 


nE 


72335 


15820 


17901 


tsc 


3280 


5384 


1804 


ts 


1.47E5 


5.56E4 


3.65E4 


mV 


2.98 


3.65 


3.21 



Focusing on the methods that search in the space of the orderings, restricting 
the search process by using a small radius produces results slighty worse than 
the unrestricted search, from the point of view of the solution quality, but with 
an important gain in efficiency. We conjecture that a radius of about a half of 
the maximum radius would be an optimal choice. On the other hand, VNSO 
is better than HCSO if we use the same radius, as we could expect. However, 
HCSO with maximum radius (r = 36) behaves even a bit better than VNSO 

^ remember that this initialization uses the ordering of the variables in the database, 
which is a particularly bad ordering. 
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with small radius (r = 7). We also conjecture that VNSO equipped with a radius 
of about one third of the maximum radius would produce excellent results. 

Table 4. K2 and KL values for the three initial networks. 





Empty 


K2SN 


PC 


K2 

KL 


-86822 

0.0 


-47515 

9.2205 


-51083 

8.3170 



With respect to the complexity of the search methods, although HCSO eval- 
uates less individuals than HCST^, the cost of evaluating each individual for 
HCSO is greater than for HOST. Overall, although HCSO gives results better 
than HOST, the latter is more efficient than the former. Obviously, the complex- 
ity of VNSO increases considerably. Nevertheless, we have observed that VNSO 
always finds the best individual in the first iteration (of step (a)), and due to 
the stopping criterion selected, it needs to perform another complete iteration 
to halt. So, the complexity of VNSO could be considerably reduced without 
compromising the quality of the result by performing only one iteration. 

7 Concluding Remarks 

In this work we have proposed a new strategy for learning belief networks based 
on searching for good orderings and searching for good networks compatible 
with a given ordering. Moreover, a new search method has been adapted to this 
problem. The proposed methods have improved the results obtained by other 
classical search methods that explore the space of dags. Nevertheless, a more 
systematic experimentation has to be done in order to confirm this conclusion. 
We also plan to study in the future other variants of VNS, as well as other 
operators for defining local changes in the space of the orderings. 

Acknowledgements. This work has been supported by the Spanish Comision 
Interministerial de Ciencia y Tecnologia (CICYT), under project TIC 2000-1351. 
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Abstract. The K2 metric is a well-known evaluation measure (or scor- 
ing function) for learning Bayesian networks from data [7]. It is derived 
by assuming uniform prior distributions on the values of an attribute for 
each possible instantiation of its parent attributes. This assumption in- 
troduces a tendency to select simpler network structures. In this paper we 
modify the K2 metric in three different ways, introducing a parameter 
by which the strength of this tendency can be controlled. Our experi- 
ments with the ALARM network [2] and the BOBLO network [17] sug- 
gest that — somewhat contrary to our expectations — a slightly stronger 
tendency towards simpler structures may lead to even better results. 



1 Introduction 

Probabilistic inference networks — especially Bayesian networks [15] and Markov 
networks [14] — are well-known tools for reasoning under uncertainty in mul- 
tidimensional domains. The idea underlying them is to exploit independence 
relations between the attributes used to describe a domain — an approach which 
has been studied extensively in the field of graphical modeling, see e.g. [12] — in 
order to decompose a multivariate probability distribution into a set of (condi- 
tional or marginal) distributions on lower-dimensional subspaces. Early efficient 
implementations include HUGIN [1] and PATHFINDER [9]. 

In this paper we focus on Bayesian networks. Formally, a Bayesian network 
represents a factorization of a multivariate probability distribution that results 
from an application of the product theorem of probability theory and a simplifi- 
cation of the factors achieved by exploiting conditional independence statements 
of the form P{A \ B, A) = P{A \ A), where A and B are attributes and A is a 
set of attributes. Hence the represented joint distribution can be computed as 

n 

P{Ai, An) = P{Ai I par(Aj)), 

i=l 

where par(Ai) is the set of parents of attribute Ai in a directed acyclic graph 
that is used to represents the factorization. 

Bayesian networks provide excellent means to structure complex domains 
and to draw inferences. However, constructing a Bayesian network manually can 
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be tedious and time-consuming. Considerable expert knowledge — domain knowl- 
edge as well as mathematical knowledge — is necessary to get it right. Therefore 
an important line of research is the automatic construction of Bayesian networks 
from a database of sample cases. Most algorithms for this task consist of two 
ingredients: a search method to traverse the possible network structures and an 
evaluation measure or scoring function to assess the quality of a given network. 

In this paper we consider only the latter component, i.e., the scoring func- 
tion. A desirable property of a scoring function is decomposahility, i.e., that it 
can be computed by aggregating local assessments of subnetworks or even sin- 
gle edges. Intuitively, a decomposable scoring function assesses the significance 
of dependences between attributes in the database, in order to decide which 
edges between attributes are needed in the Bayesian network. An example for 
a decomposable scoring functions is mutual information [13,6]. Decomposable 
scoring functions are often used to select parents for each attribute, for example, 
in a greedy manner as in the K2 algorithm [7] . 

Due to the analogy of selecting parents for an attribute to the induction 
of a decision tree, there is a large variety of scoring functions [11,19,3]. Each of 
them exhibits a different sensitivity w.r.t. dependences in the data: Some scoring 
functions tend to select more edges/parents than others. Since in a cooperation 
with DaimlerChrysler, in which we work on fault diagnosis, it turned out that it 
is of practical importance to be able to control this sensitivity, we searched for 
parameterized families of scoring functions, where the parameter controls the 
sensitivity. In this paper we report the results of this research, which led us to 
certain variants of the K2 metric. 



2 The K2 Metric 

The K2 metric was derived first in [7], where it was used in the K2 algorithm, 
and later generalized in [10] to the Bayesian Dirichlet metric. It is the result of 
a Bayesian approach to learning Bayesian networks from data. The idea is as 
follows [7]: We are given a database D of sample cases over a set of attributes, 
each having a finite domain. It is assumed (1) that the process that generated 
the database can be accurately modeled as a Bayesian network, (2) that given 
a Bayesian network model cases occur independently, and (3) that cases are 
complete. Given these assumption we can compute from a given network struc- 
ture Bs and a set of conditional probabilities Bp associated with it the probabil- 
ity of the database, i.e., we can compute P{D\Bs, Bp). Adding an assumption 
about the prior probabilities of the network structures and the probability pa- 
rameters and integrating over all possible sets of conditional probabilities Bp 
for the given structure Bs yields P{Bs,D): 

P{Bs,D)= [ P{D\Bs,Bp)f{Bp\Bs)P{Bs)dBp, 

JBp 

where / is the density function on the space of possible conditional probabilities 
and P{Bs) is the prior probability of the structure Bs- P{Bs,D) can be used 




242 C. Borgelt and R. Kruse 



to rank possible network structures, since obviously 

P{Bs,\D) ^ P{Bs,,D) 

P{Bs,\D) P{Bs„D)- 

With the additional assumption that the density functions / are marginally 
independent for all pairs of attributes and for all pairs of instantiations of the 
parents of an attribute, we arrive at (see [7] for details): 

P(Ss, D) = P{Bs) Uli ' [li Cr n&ijk , . . . , 0r,jk)d9,,k ■ ■ ■ d9r,qk. 

Here n is the number of attributes of the network, qk is the number of distinct 
instantiations of the parents attribute k has in the structure Bs, and is the 
number of values of attribute k. Oijk is the probability that attribute k assumes 
the i-th value of its domain, given that its parents are instantiated with the j-th 
combination of values, and is the number of cases in the database, in which 
the attribute k is instantiated with its i-th value and its parents are instantiated 
with the j-th value combination. 

In the following we confine ourselves to single factors of the outermost prod- 
uct and thus drop the index k. That is, we consider only single attribute scores. 
This is justified because of the factorization property of Bayesian networks. Using 
a uniform prior density on the parameters Oij, namely f{0ij, . . . , 9rj) = (r — 1)!, 
and assuming that the possible networks structures are equally likely yields as 
a scoring function [7]: 



K2(Al p«r(.4)) = n n 

j=l ^ V i=l 

where H is a child attribute and par(H) is the set of its parents, r is the number 
of values of the attribute A and q is the numbers of distinct instantiations of 
its parent attributes. Nij is the number of cases in which attribute A is instan- 
tiated with its i-th value and its parents are instantiated with their j-th value 
combination; N j = X)i=i P^ij- that in the derivation of the above function 
the solution of Dirichlet’s integral [8] 




. . d9rj 



II V : 



was used, which we need again below. 

The higher the value of the above scoring function K2 (i.e., its product over 
all attributes), the better the corresponding network structure. To simplify the 
computation of this measure often the logarithm of the above function is used: 

log 2 (K 2 (H| par(H))) = ^^log 2 log2 

j = l ^ '' j^l 
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As already said above, the K2 metric was generalized to the Bayesian Dirichlet 
metric in [10]. This more general scoring function is defined as 



BD(A|par(A)) = 



nN',) ^ 

nNj+N',)l{ 



r{Nrj + Nr) 

nm,) 



where F is the well-known generalized factorial, 

r{x) = / e~*F~^dt, Vn G IN : T(n -I- 1) = n!. 

Jo 

It is used to take care of the fact that and N'j = N-j, which represent 
a prior distribution (see [10] for details), may not be integer numbers. Obviously, 
the K2 metric results for the simple choice Vt,j : AT = 1, which very clearly 
signifies the assumption of a uniform prior distribution. 

This representation also makes it plausible why the K2 metric has a tendency 
to select simpler network structures, i.e., why algorithms using it are somewhat 
reluctant to add parent attributes. By the prior Nr = 1 the frequency distribu- 
tions are somewhat “leveled out” and the more so, the more parent attributes 
there are. The reason is that the number of cases in the database for a given 
instantiation of the parent attributes is the smaller, the more parents there are, 
simply because each parent introduces an additional constraint. Hence the in- 
fluence of the data frequencies Nij is smaller for a larger number of parents and 
consequently an attribute seems to be less strongly dependent on its parents. 
The result is an inclination to reject a(nother) parent. 

Analogously, we can see why the Bayesian Dirichlet likelihood equivalent 
uniform (BDeu) metric [5,10], which has Vi, j : Nr = where s is a parameter 
called the equivalent sample size, has a tendency to select more complex network 
structures and tends to connect attributes with many possible values. Due to 
the product r • g in the denominator the influence of the prior is reduced by 
an additional parent and by parents with many possible values. The result is 
an increased influence of the data frequencies Nij for more parents and thus a 
tendency to add a(nother) parent attribute. 



3 Modifications of the K2 Metric 

In this section we introduce three modifications of the K2 metric, all of which 
contain a parameter through which the strength of the tendency of the K2 metric 
towards simpler network structures can be controlled. 



3.1 Weighted Data 

The argument given above to explain the tendency of the K2 metric directly 
suggests an idea to control this tendency. Since the tendency depends on the 
relation of the data frequencies Nij and the prior Nr = 1 one may consider 
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weighting either of them. Due to the numerical properties of the D-function, 
especially its behavior for arguments less than 1, weighting the data frequencies 
seems to be preferable. That is, we simply multiply the data frequencies with a 
factor f3, which we write as /3 = (oi + 1)^, since this form is advantageous for 
the presentation of our experimental results (see below). 

This factor can also be made plausible as follows: Formally the factor (3 is 
equivalent to the assumption that we observed the data /3 times and thus we 
artificially increase or reduce the statistical basis of the network induction. Of 
course, a larger statistical basis allows us to justify a more complex structure, 
whereas a smaller basis allows us only to justify a simpler one. It should be 
noted, though, that we introduce this factor here only to study the properties 
of the K2 metric, not as a statistically justifiable correction factor. 

With such a factor we get the following family of scoring functions: 

K2i'.'(A| par(A)) = R II r((«. + 1)"% + D. 

j — 1 } -3 } 

Obviously, for oi = 0 we have the standard K2 metric as it was described above. 
For oi < 0 we get a stronger, for «i > 0 we get a weaker tendency to select 
simpler network structures. 



3.2 Modified Prior 



In the derivation of the K2 metric it is assumed that the density functions on 
the spaces of conditional probabilities are uniform. However, after we found the 
best network structure w.r.t. the K2 metric, we no longer integrate over all con- 
ditional probabilities (e.g. when we propagate evidence in the induced network). 
Although, of course, it is possible in principle to average over several network 
structures, a single network is often preferred. Hence we fix the structure and 
compute estimates of the probabilities using, for example, Bayesian or maximum 
likelihood estimation. Therefore the idea suggests itself to reverse these steps. 
That is, we could estimate first for each structure the best conditional proba- 
bility assignments and then select the best structure based on these, then fixed, 
assignments. Formally, this can be done by choosing the density functions in 
such a way that the estimated probabilities have probability 1. Using maximum 
likelihood estimation of a multinomial distribution we thus get 



where 6 is Dirac’s (5-function (or, more precisely, ^-distribution, since it is not a 
classical function), which is defined to have the following properties: 



6{t) = 



( - 1-00 

10 



for t = 0, 
for t yf 0, 



/ -l-oo p+oo 

S{t)dt = I, / S{t)ip(t)dt = ip{0). 

-oo J — oo 
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Inserting this density function into the formula for P{Bs,D) derived above, we 
get as a scoring function: 



K2^)(A|par(A)) = n 






dOij . . . ddr^.j 



q ( r 

=n n 




Ni, 



An interesting thing to note about this function is that obviously 
N„ • i7(A| par(A)) = - log 2 K2^)(A| par(A)), 



where fV,, = -ff(A|par(A)) is the expected entropy of the proba- 

bility distribution on the values of attribute A given its parents. Note that we 
get the well-known mutual information (also called cross entropy or information 
gain) [13,6,16] if we relate the value of this measure to its value for a structure 
in which attribute A has no parents, i.e.. 



N.. 



■ /gain(4l,par(A)) = log 2 



K2^)(A|par(A)) 

K2^)(A|0) 



In other words, mutual information turns out to be equivalent to a so-called 
Bayes factor of this metric. 

This Bayesian justification of mutual information as a scoring function may 
be doubted, since in it the database is — in a way — used twice to assess the quality 
of a network structure, namely once directly and once indirectly through the 
estimation of the parameters of the conditional probability distribution. Formally 
this approach is not strictly correct, since the density function on the parameter 
space should be a prior distribution whereas the estimate we used clearly is a 
posterior distribution (since it is computed from the database). However, the 
fact that mutual information results — a well-known and well-founded scoring 
function — is very suggestive evidence that this approach is worth to be examined. 

The above derivation of mutual information as a scoring function assumes 
Dirac pulses at the maximum likelihood estimates for the conditional probabili- 
ties. However, we may also consider the likelihood function directly, i.e.. 



n0,„...,0rj)=C^Y[0f/\ 

i=l 



Cl 



(IV, -4-r- 1)! 



where the value of the normalization constant ci results from the solution of 
Dirichlet’s integral (see above) and the fact that the integral over 0ij, . . . ,0^j 
must be 1 (since / is a probability density function) . 

With this consideration a family of scoring functions suggests itself, which 
can be derived as follows: First we normalize the likelihood function, so that the 
maximum value of this function becomes 1. This is easily achieved by dividing the 
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likelihood function by the maximum likelihood estimate raised to the power Nij. 
Then we introduce an exponent 02 , through which we can control the “width” 
of the function around the maximum likelihood estimate. Thus, if the exponent 
is zero, we get a constant function, if it is one, we get a function proportional to 
the likelihood function, and if it approaches infinity, it approaches Dirac pulses 
at the maximum likelihood estimate. That is, we get the family: 

C 2 and C 3 are normalization factors to be chosen in such a way that the integral 
over 9ij, . . . , O^j is 1. Thus we find, using again the solution of Dirichlet’s integral. 



_ r{a2N,j + r) 

nun«2% + i)- 

Inserting the derived parameterized density into the function for the probability 
P{Bs, D) and evaluating the formula using Dirichlet’s integral yields the family 
of scoring functions 



r{a2N,j + r) -p-i- T((a2 + + 1) 

n r((a. + l)iV., + ,.) n r(a,iv., + l) ■ 

From the derivation above it is clear that we get the K2 metric for 02 = 0. 
Since «2 is, like «i, a kind of data weighting factor, we have a measure with a 
stronger tendency towards simpler network structures for «2 < 0 and a measure 
with a weaker tendency for «2 > 0. However, in order to keep the argument of 
the T-function positive, negative values of a .2 cannot be made arbitrarily large. 
Actually, due to the behavior of the T-function for arguments less than 1, only 
positive values seem to be useful. 



K2i2)(H|par(A)) 



3.3 Weighted Coding Penalty 

It is well-known that Bayesian estimation is closely related to the minimum de- 
scription length (MDL) principle [18]. Thus it is not surprising that the K2 metric 
can also be justified by means of this principle. The idea is as follows (see e.g. 
[11], where it is described w.r.t. decision tree induction): Suppose the database 
of sample cases is to be transmitted from a sender to a receiver. Both know the 
number of attributes, their domains, and the number of cases in the database^, 
but at the beginning only the sender knows the values the attributes are instan- 
tiated with in the sample cases. Since transmission is costly, it is tried to code 
the values using a minimal number of bits. This can be achieved by exploiting 

^ A strict application of the MDL principle wonld assume that these numbers are 
unknown to the receiver. However, since they have to be transmitted in any case, 
they do not change the ranking and thus are neglected or assumed to be known. 
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properties of the value distributions to construct a good coding scheme. How- 
ever, the receiver cannot know this coding scheme without being told and thus 
the coding scheme has to be transmitted, too. Therefore the total length of the 
description of the coding scheme and the description of the values based on the 
chosen coding scheme has to be minimized. 

The transmission is carried out as follows: The values of the sample cases 
are transmitted attribute by attribute. That is, at first the values of the first 
attribute are transmitted for all sample cases, then the values of the second at- 
tribute are transmitted, and so on. Thus the transmission of the values of an 
attribute may exploit dependences between this attribute and already trans- 
mitted attributes to code the values more efficiently. Using a coding based on 
absolute value frequencies (for coding based on relative frequencies, see [11,3]) 
and exploiting that the values of a set par (A) of already transmitted attributes 
are known, the following formula can be derived for the length of a description 
of the values of attribute A: 



L{A\ par(A)) = log 2 S - 



^log 

i=i 



(fV.j- -k r - 1)! 
2 fV,j!(r-l)! 



i=i 



N.jl 



Here S is the number of possible selections of a set par (A) from the set of already 
transmitted attributes. The lower the value of the above function (that is, its 
sum over all attributes), the better the corresponding network structure. 

The above formula can be interpreted as follows: First we transmit which 
subset par(H) of the already transmitted attributes we use for the coding. We 
do so by referring to a code book, in which all possible selections are printed, 
one per page. This book has S pages and thus transmitting the page number 
takes log 2 S bits. (This term is usually neglected, since it is the same for all 
selections of attributes.) Then we do a separate coding for each instantiation 
of the attributes in par(H). We transmit first the frequency distribution of the 
values of the attribute A given the j-th instantiation of the attributes in par(H). 
Since there are N,j cases in which the attributes in par(H) are instantiated with 
the j-th value combination and since there are r values for the attribute A, there 
are possible frequency distributions. We assume again that all of these 

are printed in a code book, one per page, and transmit the page number. Finally 
we transmit the exact assignment of the values of the attribute A to the cases. 
Since we already know the frequency of the different values, there are , 

possible assignments. Once again we assume these to be printed in a code book, 
one per page, and transmit the page number. 

It is easy to verify that it is 



L{A\ par(H)) = - log 2 K2(H| par(H)) 



if we neglect the term log 2 S (see above) . Hence minimizing the network score 
w.r.t. L{A\ par(H)) is equivalent to maximising it w.r.t. K2(H| par(H)). 

The above considerations suggests a third way to introduce a parameter for 
controlling the tendency towards simpler network structures. In the MDL view 
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the tendency results from the need to transmit the coding scheme, the costs of 
which can be seen as a penalty for making the network structure more complex: 
If the dependences of the attributes do not compensate the costs for transmitting 
a more complex coding scheme, fewer parent attributes are selected. Hence the 
tendency is mainly due to the term describing the costs for transmitting the 
coding scheme and we may control the tendency by weighting this term. In order 
to achieve matching ranges of values for the parameters and thus to simplify the 
presentation of the experimental results (see below), we write the weighting 
factor as Thus we get the following family of scoring functions: 



{A\ par(H)) 



03 + 1^ Njl (r-1)! 

j=i J 







Obviously, for 03 = 0 we have a measure that is equivalent to the K2 metric. 
For «3 < 0 we get a measure with a stronger tendency to select simpler network 
structures, for 03 > 0 we get a measure with a weaker tendency. 



4 Experimental Results 



We implemented all of the abovementioned families of scoring functions as part 
of INES (Induction of NEtwork Structures), a prototype program for learning 
probabilistic networks from a database of sample, which was written by the first 
author. With this program we conducted several experiments based on the well- 
known ALARM network [2] and the BOBLO network [17]. For all experiments 
we used greedy parent selection w.r.t. a topological order (the search method of 
the K2 algorithm). Of course, other search methods may also be used, but we 
do not expect the results to differ significantly. 

The experiments were carried out as follows: For each network we chose three 
database sizes, namely 1000, 2000, and 5000 tuples for the ALARM network and 
500, 1000, and 2000 tuples for the BOBLO network. For each of these sizes we 
randomly generated ten pairs of databases from the networks. The first database 
of each pair was used to induce a network, the second to test it (see below). For 
each database size we varied the parameters introduced in the preceding section 
from —0.95 to 1 (for ai and 03) and from 0 to 1 (for 02) in steps of 0.05. 

The induced networks were evaluated in two different ways: In the first place 
they were compared to the original networks by counting the number of missing 
edges and the number of additional edges. Furthermore they were tested against 
the second database of each pair (see above) by computing the log-likelihood 
(natural logarithm) of this database given the induced networks. For this the 
conditional probabilities of the induced networks were estimated from the first 
database of each pair (i.e., the one the network structure was induced from) with 
Laplace corrected maximum likelihood estimation, i.e., using 



j ■ Pi\j 



Nij + 1 



in order to avoid problems with impossible tuples. 
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1000 tuples: 



2000 tuples: 



5000 tuples: 






Fig. 1. Results for the ALARM network. 



The results w.r.t. the parameters ai and are shown in figures 1 and 2. The 
results for a 2 , which are less instructive, since this parameter should be positive, 
are very similar to the right halves of the diagrams for a\ and 0 : 3 . Each diagram 
contains three curves, which represent averages over the ten pairs of databases: 

a: the average number of missing edges, 
b: the average number of additional edges, 
c: the average log-likelihood of the test databases. 

The scale for the number of missing/additional tuples is on the left, the scale for 
the log-likelihood of the test databases on the right of the diagrams. 

All diagrams demonstrate that the tendency of the K2 metric (which corre- 
sponds to = 0, A: = 1,2,3) is very close to optimal. However, the diagrams 
also indicate that a slightly stronger tendency towards simpler network struc- 
tures (ttfc < 0) may lead to even better results. With a slightly stronger tendency 
some of the few unnecessary additional edges selected with the K2 metric can be 
suppressed without significantly affecting the log-likelihood of test data (actually 
the log-likelihood value is usually also slightly better with a stronger tendency, 
although this is far from being statistically significant). 

It should be noted, though, that in some applications a weaker tendency to- 
wards simpler network structures is preferable. For example, in a cooperation 
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500 tuples: 




1000 tuples: 




2000 tuples: 







Fig. 2. Results for the BOBLO network. 



with DaimlerChrysler, in which we work on fault diagnosis, we faced the problem 
that in tests against expert knowledge sometimes dependences of faults on the 
vehicle equipment, which were known to the domain experts, could not be found 
with the K2 metric. Usually this was the case if the dependence was restricted 
to one instantiation of the parent attributes. By adapting the parameters in- 
troduced above, however, these dependences were easily found. We regret that 
details of these results are confidential, so that we cannot present them here. 



5 Conclusions 



In this paper we introduced three modifications of the K2 metric, each of which 
adds a parameter to control the tendency towards simpler network structures. 
The resulting families of scoring functions provided us with means to explore 
empirically the properties of the K2 metric. Our experimental results indicate 
that the tendency strength of the K2 metric is a very good choice, but that a 
slightly stronger tendency towards simpler network structures may lead to even 
better results, although the improvement is only marginal. 
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Abstract. This paper deals with representation and solution of asymmetric deci- 
sion problems. We describe a new graphical representation called sequential 
valuation networks, which is a hybrid of Covaliu and Oliver’s sequential deci- 
sion diagrams and Shenoy’ s asymmetric valuation networks. Sequential valua- 
tion networks inherit many of the strengths of sequential decision diagrams and 
asymmetric valuation networks while overcoming many of their shortcomings. 
We illustrate our technique by representing and solving a modified version of 
Covaliu and Oliver’s Reactor problem. 



1 Introduction 

The goal of this paper is to propose a new graphical technique for representing and 
solving asymmetric decision problems. The new graphical representation is called a 
sequential valuation network and it is a hybrid of sequential decision diagrams (SDDs) 
[3] and asymmetric valuation networks (VNs) [13]. Sequential valuation networks 
adapt the best features from SDDs and asymmetric VNs and provide a fix to some of 
the major shortcomings of these techniques as identified by Bielza and Shenoy [1]. 
The algorithm for solving sequential valuation networks is based on the idea of de- 
composing a large asymmetric problem into smaller symmetric sub-problems and then 
using a special case of Shenoy’ s fusion algorithm to solve the symmetric sub- 
problems. 

In a decision tree representation, a path from the root node to a leaf node is called a 
scenario. A decision problem is said to be asymmetric if there exists a decision tree 
representation such that the number of scenarios is less than the cardinality of the 
Cartesian product of the state spaces of all chance and decision variables. 

There are three types of asymmetry in decision problems — chance, decision, and 
information. First, the state space of a chance variable may vary depending on the 
scenario in which it appears. In the extreme, a chance variable may be non-existent in 
a particular scenario. For example, if a firm decides not to test market a product, we 
are not concerned about the possible results of test marketing. Second, the state space 
of a decision variable may depend on the scenario in which it appears. Again, at the 
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extreme, a decision variable may simply not exist for a given scenario. For example, if 
we decide not to buy a financial option contract, the decision of exercising the option 
on the exercise date does not exist. Finally, the information constraints may depend on 
the scenarios. For example, in diagnosing a disease with two symptoms, the order in 
which the symptoms are revealed (if at all) may depend on the sequence of the tests 
ordered hy the physician prior to making a diagnosis. A specific example of informa- 
tion asymmetry is described in Section 5. Most of the examples of asymmetric deci- 
sion problems have focused on chance and decision asymmetry. Information asym- 
metry has not been widely studied. 

Several graphical techniques have been proposed for representing and solving 
asymmetric decision problems — traditional decision trees [11], combination of influ- 
ence diagrams (IDs) and decision trees [2], contingent influence diagrams [5], influ- 
ence diagrams with distribution trees [14], decision graphs within the ID framework 
[10], asymmetric valuation network representation with indicator valuations [13], 
sequential decision diagrams [3], configuration networks [7], asymmetric influence 
diagrams [9], and valuation networks with coarse valuations [8]. Each of these meth- 
ods has some advantages and disadvantages. For a comparison of decision trees, 
Smith-Holtzman-Matheson’ s influence diagrams, Shenoy’s asymmetric valuation 
networks, and Covaliu and Oliver’s sequential decision diagrams, see [1]. 

Covaliu and Oliver’s SDD representation [3] is a compact and intuitive way of rep- 
resenting the structure of an asymmetric decision problem. One can think of a SDD as 
a clustered decision tree in which each variable appears only once (as in influence 
diagrams and VNs). Also, SDDs model asymmetry without adding dummy states to 
variables. However, the SDD representation depends on influence diagrams to repre- 
sent the probability and utility models. Also, preprocessing may be required in order to 
make the ID representation compatible with the SDD representation so that the for- 
mulation table can be constructed. One unresolved problem is that although a SDD 
and a compatible ID use the same variables, the state spaces of these variables may 
not be the same. The problem of exponential growth of rows in the formulation table 
is another major problem of this method. Finally, this method is unable to cope with 
an arbitrary factorization of the joint utility function. It can only handle either a single 
undecomposed utility function, or a factorization of the joint utility function into fac- 
tors where each factor only includes a single variable. 

Shenoy's asymmetric VN representation [13] is compact in the sense that the model 
is linear in the number of variables. It is also flexible regarding the factorization of the 
joint probability distribution of the random variables in the model — the model works 
for any multiplicative factorization of the joint probability distribution. However, this 
representation technique cannot avoid the creation of artificial states that lead to an 
increased state space for some variables in the model. Some types of asymmetry can- 
not be captured in the VN representation. Also, the asymmetric structure of a decision 
problem is not represented at the graphical level, but instead in the details of the indi- 
cator valuations. 

This paper presents a new graphical representation called a sequential valuation 
network (SVN) that is a hybrid of SDDs and asymmetric VNs. This new graphical 
method combines the strengths of SDDs and VNs, and avoids the weaknesses of ei- 
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ther. We use the graphical ease of SDD representation of the asymmetric structure of a 
decision problem, and attach value and probability valuations to variables as in VNs. 
The resulting SVN representation is able to address many of the shortcomings of VNs 
and SDDs as follows. The state spaces of the variables do not include artificial states, 
and all types of asymmetry can be represented. This is true for the Reactor problem 
and we conjecture that these aspects are true of all asymmetric problems. Most of the 
asymmetric structure of a decision problem is represented at the graphical level. A 
SVN does not need a separate graph to represent the uncertainty model. No pre- 
processing is required to represent a decision problem as a SVN, i.e., it is not neces- 
sary to construct a formulation table prior to solving a SVN. Finally, a SVN can easily 
represent any factorization of the joint utility function. 

To solve SVNs, we identify different symmetric sub-problems as paths from the 
source node to the terminal node. Each such path represents a collection of scenarios. 
Finally, we apply a special case of Shenoy’ s [12] fusion algorithm for each sub- 
problem and solve the global asymmetric problem by solving smaller symmetric sub- 
problems. The strategy of breaking down an asymmetric decision problem into several 
symmetric sub-problems is also used by [7] and [9] . 

An outline of the remainder of the paper is as follows. In Section 2, we give a com- 
plete statement of a modified version of the Reactor problem of [3], and describe a 
decision tree representation of it. In Section 3, we represent the same problem using 
SVN representation and in Section 4, we sketch its solution. Finally, in Section 5, we 
conclude by summarizing some strengths of our representation as compared to the 
representations proposed so far. 



2 The Reactor Problem 

An electric utility firm must decide whether to build (D 2 ) a reactor of advanced design 
(a), a reactor of conventional design (c), or no reactor («). If the reactor is successful, 

i.e., there are no accidents, an advanced reactor 
fli, 0.660 g, 0.182 is more profitable, but it is also riskier. Past 

experience indicates that a conventional reactor 
(Q has probability 0.980 of being successful 
(cs), and a probability 0.020 of a failure (cf). On 
the other hand, an advanced reactor (A) has 
probability 0.660 of being successful (as), prob- 
ability 0.244 of a limited accident (al), and 
probability 0.096 of a major accident (am). If 
the firm builds a conventional reactor, the prof- 
its are $8B if it is a success, and -$4B if there is 
a failure. If the firm builds an advanced reactor, 
the profits are $12B if it is a success, -$6B if 
there is a limited accident, and -$10B if there is 
Fig. 1 . A Probability Model for A and R a major accident. The firm’s utility function is 
in the Reactor Problem assumed to be linear in dollars. 
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Before making the decision to build, the firm has the option to conduct a test 
{D[ = t) or not (Dj = nt) of the components of the advanced reactor. The test results 
(R) can be classified as bad (b), good (g), or excellent (e). The cost of the test is $1B. 
The test results are highly correlated with the success or failure of the advanced reac- 
tor (A). Figure 1 shows a causal probability model for A and R in the Reactor problem. 
Notice that if A = as, then R cannot assume the state b. If the test results are bad, then 
as per the probability model, an advanced reactor will result in either a limited or a 
major accident, and consequently, the Nuclear Regulatory Commission will not li- 
cense an advanced reactor. 



2.1 Decision Tree Representation and Solution 

Figure 2 shows a decision tree representation and solution of this problem. The opti- 
mal strategy is as follows. Do the test; build a conventional reactor if test results are 
bad or good, and build an advanced reactor if test results are excellent. The maximum 
expected profit is $8.13B. 

The decision tree representation given in Figure 2 successfully captures the asym- 



Profit, B$ 




Fig. 2. A Decision Tree Representation and Solution of the Reactor Problem 
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metric structure of the Reactor problem. The product of the cardinalities of the state 
spaces of the decision and chance variables is 108, but there are only 21 possible sce- 
narios in this problem. The decision tree is shown using coalescence, i.e., repeating 
sub-trees are shown only once. With coalescence, the number of endpoints is reduced 
to 12. Notice that before we can complete the decision tree representation, we need to 
compute the required probabilities, i.e. P(R) and P(AIR). 



3 Sequential Valuation Network Representation 

In this section, we define a new hybrid representation, which we call a sequential 
valuation network. First we start with some notation. 

Valuation Fragments. Suppose a is a utility valuation for h, i.e., a. £\^R, 
where denotes the state space of the variables in h, and R denotes the set of real 
numbers. We shall refer to h as the domain of a. Suppose g and suppose F <z £2g. 
Then cd/"is a function cd/T F><^j_g — >R such that (oUJ(Xg, Xh-g) = o(Xg, Xh-g) for all 
Xg E F, and all jCh_g e We call oiFa restriction of a to F We will also refer to 

dFas a fragment of a. We will continue to regard the domain of cd/"as h. Notice that 
cdi2g = a. 

Often, /"is a singleton subset of ilg, F= {Xgj. In this case, we write oiF as cAXg. 
For example, suppose ais a valuation for {A, B} where = {ai, a 2 ) and = {b^, 
b 2 , bj,}. Then, a can be represented as a table as shown in the left hand side of Table 

1. The restriction of ato aj, oia[, is shown in the right hand side of Table 1. In prac- 
tice, valuation fragments will be specified without specifying the full valuation. In the 
case of utility valuations, the unspecified values can be regarded as zero utilities 
(whenever the utility function decomposes additively), and in the case of probability 
valuations, the unspecified values can be regarded as zero probabilities. 

A complete SVN representation of the Reactor problem is given in Figure 3, Table 

2, and Table 3. The SVN graph consists of six types of nodes — chance, decision, ter- 
minal, indicator, utility and probability. Chance nodes are shown as circles and repre- 
sent random variables. In the Reactor problem representation, there are three chance 
nodes, R, A, and C. Decision nodes are shown as rectangles and represent decision 

variables. In the Reactor prob- 
Table 1. An Example of a Valuation Fragment l®rn representation, there are 

two decision nodes, Di and D 2 - 
The terminal node is shown as 
an octagon and is a compact 
version of the end points of a 
decision tree. The terminal 
node is labeled T in the Reactor 
problem representation. Indi- 
cator valuations are shown as 
triangles with a double border, 
probability valuations are 



{aijxQfi 


alai 


<21, b[ 
a\, b 2 
ai, bj, 


o^ai, b[) 
Oifli, b2) 
oifli, bf) 





a 


a\, b[ 


0(oi, bi) 


a\, b 2 


o(ai, b2) 


ai, b^ 


o(a[, bf) 


02, bi 


0(02, i>\) 


02, i>2 


0(O2, i>2) 


02, ^3 


0(o2, bf) 






Sequential Valuation Networks: A New Graphical Technique 



257 




Fig. 3. A SVN Graphical Representation of the Reactor Prohlem 



shown as triangles with a single border, and utility valuations are shown as diamonds. 
For further details, see [4]. 

The structure of the sub-graph is similar to the SDD graphical representation of [3] 
(with minor differences in the terminal node and the annotations associated with the 
directed edges) and the attached valuations have the same semantics as VNs [13]. 

In the qualitative part, we first define the state spaces of all chance and decision 
variables, and then specify the details of the indicator valuations. In the Reactor prob- 
lem, Qp)j = {t, nt]. Or = {b, g, e}, Qp)^ = {a, c, n], = {as, al, am}, and Qq = {ci, 

c/j. The indicator valuation bp It with domain (tjxjR, D 2 } is a constraint on the 
choices available to the decision-maker at D 2 . This constraint can be specified by 
listing all states in |t)xQ{R that are allowed. Thus, the states that are allowed by 
6 plt are {(t, b, c), (t, b, n), (t, g, a), (t, g, c), (t, g, n), (t, e, a), (t, e, c), (t, e, n)}. Simi- 
larly, the indicator valuation 82 with domain (R, A] can be regarded as a constraint on 
the state space Q{r a)- ^ rules out the state (b, as) that has zero probability. In this 
paper, we will regard an indicator valuation as a subset of the state space of its do- 
main. For example, dplt c {tjxQjpp p>}, and §2 ^ ^{R, A)- During the solution phase, 
the computations in some sub-problems are done on the relevant state space (deter- 
mined by the valuations that are being processed) constrained by the indicator valua- 
tions that are associated with the sub-problem. 

In the quantitative part, we specify the numerical details of the probability and util- 
ity valuations as given in Tables 2 and 3. The numerical specifications have to be 
consistent with the graphical and qualitative specifications in the following senses. 
First, each valuation’s domain is specified in the graphical part. For example, the 
domain of / is C. Therefore, we have to specify the values of x for s^ch state in Qq. 
Second, since the edge from 2 ' to C is directed, this means the probability valuation x 
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Table 2 . Utility Valuation Fragments in the is a conditional for C given the empty 
Reactor Problem set, i.e., the marginal of % for the empty 

set is a vacuous probability valuation. 
Third, if we have probability or utility 
valuations specified on domains for 
which we have indicator valuations, then 
it is only necessary to specify the values 
of the valuations for the states permitted 
by the indicator valuations. For example, 
probability valuation p has domain {R, 
A } . Since we have indicator valuation 
with the same domain, it is sufficient to 
specify the values of p for the states 
Table 3 . Probability Valuation Fragments in the Thus, we can regard /? as a 

Reactor Problem valuation fragment. Also, since the 

edge from /? to is directed, the 
values of p have to satisfy the condi- 
tion p ^ = i/^ where is the vacuous 
probability valuation with domain 
{A}, i.e., a valuation whose values 
are identically one. Fourth, it is suffi- 
cient to specify values of utility or 
probability valuations for those states 
that are allowed by the annotations 
on the edges between variables. For 
example, consider the utility valua- 
tion fragment v^a. The domain of 
this valuation is {D2, A). However, 
the annotation on the edge from D2 
to A tells us that all scenarios that include variable A have D2 = a. Therefore, it is 
sufficient to specify i>2 for all states in {a)xi^. Similarly, it is sufficient to specify 
lh,\c for {c)x/^, and sufficient to specify for 02)- Utility valuation v^ln 

is only specified for D2 = n. Notice that when D2 = n, the next node in the SVN is the 
terminal node T. Therefore, cannot include either A or C in its domain. 

Utility valuations V\, t>3lc, and are additive factors of the joint utility 

function, and probability valuations a, and < 5 ^ are multiplicative factors of the joint 
probability distribution. In the Reactor problem, we have a factorization of the joint 
probability distribution into conditionals, i.e., a Bayes net model. But this is not a 
requirement of the sequential valuation network representation. As we will see in the 
next section, the SVN solution technique will work for any multiplicative factorization 
of the joint probability distribution. 
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4 Solving a SVN Representation 

The main idea of the SVN solution method is to recursively decompose the problem 
into smaller sub-problems until the sub-problems are symmetric, then to solve the 
symmetric sub-problems, using a special case of the symmetric fusion algorithm [12]. 
Finally, the solutions to the sub-problems are recursively combined to obtain the solu- 
tion to the original problem. We begin with some notation. 



4.1 Combination 

Consider two utility valuations \|/i for hj and \|/2 for h2. As defined in [12], we com- 
bine utility valuations using pointwise addition assuming an additive factorization of 
the joint utility function. In the SVN method, each sub-problem deals with valuation 
fragments that are relevant to the sub-problem. We start with defining combination of 
utility fragments. 

Case 1. [Combination of utility fragments] Suppose gj c hj, and g2 c h2, and con- 
sider two utility fragments t|/ilFi and \|/2lF2 where Fj c Qg^, and F2 c Let F 
denote ((FjXQhj h2-gi)'^(r2Xf^j h2-g2)) combination of \|/ilFi and 

\|/2lF2, written as (\|/ilFi)®(\|/2ir2), is a utility valuation \|/ for hiuh2 restricted to F 
given by 

(\|/IF)(y) =(\|/ilFi)(y 8i,y ^i“§i) + (\|/2lF2)(y y h2-g2) if y gi ^ n and 
y §2 e F2 

= (\|/ilFi)(y Si, y ^i~Si) if y Si e Fi and y S2 g F2 

= (t|/2lF2)(y S2^ y l'2“g2) if y gi g Pj and y S2 g ^2 

for all y e FxQ(hi h2)-(gi g2)- 

Case 2. [Combination of a utility fragment and a probability fragment] Suppose gi 
chi, g 2 E ^ 2 , and consider utility fragment and probability fragment ]/>5l/2 
where /] c and /2 c i2g^. Let F denote 

((/lXi^,j /,2_gj)n(/2Xi^j h 2 -gf>) The combination of y/i\Fi and [*51/2, 

written as ( iff\ \ F])^ifffi Ff), is a utility valuation ^for hiKjli 2 restricted to /"given by: 
({M/)(y) = (!^hlTi)(y si, y ^i^si)(^l/^)(y 82 ^ y ^2-^2) if y e /] and y S2 ^ 
and 0 otherwise, for all y e /5</^/,j h 2 )-(g\ g 2 )- 

Case 3. [Combination of probability fragments] Suppose g\ c hi, and g 2 c ^ 2 , and 
consider probability fragments y/\\Fi and \fff 1 F 2 for hi and ^2, respectively, where /] 
c and F 2 c i2g^. Let /"denote ((/jxi^j /j2_gj)n(/2Xi^j /i2-g2^) The 

combination of iffi\Fi and [/^l/2, written as ([/ql/i)®([/^l/2), is a probability valuation 
Iff for hiKjh 2 restricted to /"given by 
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{yU){y) = {y/i\ri){y y y ^ 2 ) if y e /] and y S2 ^ 

and 0 otherwise for all y e h 2 )-{g\ g 2 )- reactor problem described in this 

paper does not require this case of combination. 

Note that, the combination of two utility valuations is a utility valuation; the com- 
bination of two probability valuations is a probability valuation; and the combination 
of a utility and a probability valuation is a utility valuation. 

As for the marginalization and division operations, the SVN method uses the same 
marginalization and division operations as defined in [12]. For further details, see [4]. 



4.2 Tagging 

The recursive algorithm of solving lower level sub-problems and sending the results to 
an upper level sub-problem requires the use of a concept that we call tagging. Suppose 
\|/ is a utility valuation with domain h, and suppose X g h. Tagging \|/ by X = x is 
denoted by \|/0(ix[x), where ixlx is the vacuous utility valuation with domain |X) 
restricted to X = x. A vacuous utility valuation is a valuation that is identically zero. 
This operation basically extends the domain of \|/ from h to hu{X} without changing 
the values of \|/. 



4.3 The Fusion Algorithm 

The details of the fusion algorithm are given in [12]. In the context of sequential 
valuation networks, the fusion algorithm is the same as rollback in decision trees. 
Fusion with respect to decision variables is similar to the “folding back” operation in 
decision trees [11] and fusion with respect to chance variables is similar to the 
“averaging out” operation in decision trees [11]. Further details of the fusion algorithm 
for sequential valuation networks are found in [4] . 



4.4 Decomposition of the Problem 

Starting from the SVN graphical representation, we decompose the decision problem 
into symmetric sub-problems. The symmetric sub-problems are identified by enumer- 
ating all distinct directed paths and sub-paths from the source node to the terminal 
node in the SVN graphical representation. 

Variables. We start with the root node, say S. Next we identify all directed arcs in 
the SVN that lead out of the source node S. For each directed arc, say to variable X, 
we create a new sub-problem consisting of variables S and X on the path from the 
source node to variable X. We retain the annotation on the edges. We recursively pro- 
ceed in this manner until all paths and sub-paths have been enumerated. Notice that 
the terminal node is not a variable and we do not include it in any sub-problem. The 
resulting directed tree is called a “decomposition tree.” Figure 4 shows the decompo- 
sition tree that is constructed for the reactor problem. 
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Sub-problem 1 Sub-problem 2 Sub-problem 3 Sub-problem 4 



Fig. 4. The Decomposition Tree for the Reactor Problem 



Utility and Indicator Valuations. We start at the root node, say S, of the decom- 
position tree with the set of all utility valuation fragments included in the SVN repre- 
sentation. All valuation fragments whose domains are included in the set of variables 
associated with the suh-problem are associated with this sub-problem. The valuations 
that are not are passed on to the child sub-problems suitably decomposed as per the 
annotation on the edges leading to the child sub-problems. This is recursively re- 
peated. 

In the Reactor problem, we start with utility and indicator valuations V[, Vila, V^\c, 
Ujln, d\\t, and Si- Valuation V[ with domain [Di) is associated with sub-problem 8. 
Of the remaining valuations, only S\\t has in its domain. Since there is no fragment 
of S\\t that has = nt, Sub-problem 7 receives valuations i> 2 \a, V^\c, Pqln, S\\t, and 
Si- Sub-problem 6 receives valuations v^a, Vi\c, Wqln, and d^. 

This process of associating utility and indicator valuations with sub-problems con- 
tinues recursively as above. The resulting distribution of utility and indicator valua- 
tions in the sub-problems is shown in Figure 4. 

Probability Valuations. We start by assuming that we have a factorization of the 
joint probability distribution for all chance variables in the problem. In the reactor 
problem, for example, the joint probability distribution rfor {C, A, R) is given by r= 

We recursively compute the probability valuation associated with a leaf sub- 
problem that ends with a chance variable, say as follows. Let F= {Cj, ..., C^} 
denote the chance variables on a path from the source node to the leaf node whose last 
variable is C^, and let P = {tti, ..., %} denote the set of probability potentials with 
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J-' 

domains hj, h]^, respectively, such that (;Ti®... 0 %) is the joint distribution for 
the chance variables in 77 The probability valuation associated with the leaf sub- 
problem whose last variable is is given by k ^!n where k = 

®{ 7^ I Cm e hj } . Furthermore, the set of probability valuations associated with the set 
of chance variables F-'C^} is u{;^ I C^ ^ hj}u{;r i.e., 

(®{;^ I Cm g hj)®;r is the joint distribution for the chance variables 

in F-{C^}. Thus, we can recursively compute the probability valuations associated 
with the other sub-problems whose last variable is a chance node. It follows from 
Lauritzen and Spiegelhalter [1988] that n ^In jg the conditional probability 

distribution for Cm given the variables in r~{ Cm } ■ For further details on how the sub- 
problems are populated with indicator, utility, and probability valuations, see [4] . 



4.5 Solving the Sub-problems 

We start with solving the leaf sub-problems. After solving a sub-problem (as per the 
definition of fusion stated in [ 12 ]), we pass the resulting utility valuation fragment to 
its parent sub-problem and delete the sub-problem. In passing the utility valuation 
fragment to the parent sub-problem, if the domain of the utility valuation fragment 
does not include any variables in the parent sub-problem, we tag the utility valuation 
with the value of the last variable in the parent sub-problem that is in the annotation. 
We recursively continue this procedure until all sub-problems are solved. 

Consider the decomposition of the Reactor problem into the eight sub-problems as 
shown in Figure 4. Consider Sub-problem 1 consisting of valuation fragments x>i\a, 
{p®a)!{p®a) and S^. We fuse the valuation fragments with respect to A using the 
definition of fusion from [ 12 ]. 

FusA{'U2la> (a®p)/(a®p) = ([t)2l«® (ot®p)/(ot®p) ^] = ('Oslaj. 

The resulting utility valuation D 5 la is sent to parent Sub-problem 5. Since Usia in- 
cludes D 2 in its domain, there is no need for tagging. All computations are done on 
relevant state spaces as constrained by indicator valuation 62 . The details of the com- 
putation are shown in Table 4. The solutions to the remainder of the sub-problems are 
given in [4] . 



5 Summary and Conclusions 

The main goal of this paper is to propose a new representation and solution technique 
for asymmetric decision problems. 

The advantages of SVNs over SDDs are as follows. SVNs do not require a separate 
influence diagram to represent the uncertainty model. SVNs can represent a more 
general uncertainty model than SDDs, which like influence diagrams assume a Bayes 
net model of uncertainties. All asymmetries can be represented in SVNs. This is not 
true for SDDs. For example, in the Reactor problem, the impossibility of R = b when 
A = is not represented in a SDD representation of the problem. SVNs do not require 
a separate formulation table representation as in SDDs. Finally, SVNs can handle any 
factorization of the joint utility function whereas SDDs as currently described can only 
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Table 4. The Details of Solving Sub-prohlem 1 



[a]x^ 


V2\a 


(a®p)l 
(a®/7) ^ 


/(«®/0) ^ = (p 


(P^ = 


a, b, al 


-6 


0.700 


-A.im 


-7.200 


a, b, am 


-10 


0.300 


-3.000 




a, g, as 


12 


0.400 


4.800 


0.649 


a, g, al 


-6 


0.460 


-2.760 




a, g, am 


-10 


0.140 


-1.400 




a, e, as 


12 


0.900 


10.800 


10.043 


a, e, al 


-6 


0.060 


-0.360 




a, e, am 


-10 


0.040 


-0.400 





be used with either an undecomposed joint utility function or with a factorization of 
the joint utility function into singleton factors. 

The advantages of SVNs over VNs are as follows. SVNs represent most of the 
asymmetry at the graphical level (some asymmetry is represented in the details of the 
indicator valuations) whereas in the case of VNs, all asymmetry is represented in the 
details of the indicator valuations. The state spaces of chance and decision nodes in 
SVNs do not include dummy states. All types of asymmetry can he represented in 
SVNs whereas VNs cannot represent some types of asymmetry. Finally, the modeling 
of probability distributions in SVNs is as intuitive as in influence diagrams (assuming 
we are given a Bayes net model for the joint probability distribution). 

One main advantage of the SVN technique is that we do not need to introduce 
dummy states for chance or decision variables. To see why this is important, we will 
describe a simple example called Diabetes diagnosis. Consider a physician who is 
trying to diagnose whether or not a patient is suffering from Diabetes. Diabetes has 
two symptoms, glucose in urine, and glucose in hlood. Assume we have a Bayes net 
model for the three variables — Diabetes (D), glucose in blood (B) and glucose in urine 
(U) — in which the joint distribution for the three variables P(D, B, U) factors into 
three conditionals, P(D), P(B I D), and P({/ I D, B). Furthermore, assume that D has 
two states, d for Diabetes is present, and ~d for Diabetes is absent, U has two states, u 
for elevated glucose levels in urine, and ~u for normal glucose level in urine, and B 
has two states, b for elevated glucose levels in blood, and ~b for normal glucose level 
in blood. The physician first decides, FT (first test), whether to order a urine test (ut) 
or a blood test (bt) or no test (nt). After the physician has made this decision and ob- 
served the results (if any), she next has to decide whether or not to order a second test 
(ST). The choices available for the second test decision depend on the decision made 
at FT. If FT = bt, then the choices for ST are either ut or nt. If FT = ut, then the 
choices for ST are either bt or nt. Finally, after the physician has observed the results 
of the second test (if any), she then has to decide whether to treat the patient for Dia- 
betes or not. As described so far, the problem has three chance variables, D, U, B, and 
three decision variables FT (first test), ST (second test), and TD (treat for Diabetes). 
Using the SVN technique, one can represent this problem easily without introducing 
any more variables or any dummy states. A SVN graphical representation is shown in 
Figure 5. In this figure, the indicator valuation fragment AFT = {bt, ut} represents a 
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constraint on ST as described above, the utility valuations /q, IC 2 , and A 3 represents a 
factorization of the total cost of diagnosing and treating the patient for Diabetes, and 
the probability valuations S= P(D), P(B I D), and v= P(U I B, D) represent a fac- 
torization of the joint probability distribution into conditionals specified by the Bayes 
net model. Notice that the SVN graphical representation has several directed cycles. 
However, these directed cycles are disallowed by the annotations on the directed edges 
and the indicator valuation i, which forbids, e.g., FT = bt, ST = bt, and also FT = ut, ST 
= ut. 

Representing this problem using Smith-Holtzman-Matheson’s asymmetric influ- 
ence diagrams [14] or Shenoy’s asymmetric valuation networks [13] is possible but 
only after either introducing additional variables or introducing dummy states for the 
existing variables. This is because if one uses the existing variables, the modeling of 
information constraints would depend on the FT decision. If FT = bt, then the true 
state of B is revealed prior to making the ST decision, and the true state of U is un- 
known when the ST decision is made. However if FT = ut, then the true state of U is 
known prior to making the ST decision and the true state of B is unknown when the ST 
decision is made. We call this aspect of the decision problem information asymmetry. 
Using either traditional influence diagrams or valuation networks, it is not possible to 
model this information asymmetry without either introducing additional variables or 
introducing dummy states for existing variables. In either of these cases, the modeling 
will need to adapt the Bayes net to a model that includes additional variables or 
dummy states or both. We leave the details of representing the Diabetes diagnosis 
problem using either influence diagrams or valuation networks or some other tech- 
nique to the ingenuity of the reader. 




Fig. 5. A SVN Representation of the Diabetes Diagnosis Problem 
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Abstract. In possibility theory, there are two kinds of possibilistic 
causal networks depending if the possibilistic conditioning is based on the 
minimum or the product operator. Product-based possibilistic networks 
share the same practical and theoretical features as Bayesian networks. 
In this paper, we focus on min-based causal networks and propose a 
propagation algorithm for such networks. The basic idea is first to trans- 
form the initial network only into a moral graph. Then, two different 
procedures, called stabilization and checking consistency, are applied to 
compute the possibility degree of any variable of interest given some 
evidence. 



1 Introduction 

Graphical models are knowledge representation tools commonly used by an in- 
creasing number of researchers, particularly from the Artificial Intelligence and 
Statistics communities. The reason for the success of graphical models is their 
capacity of representing and handling independence relationships, which are cru- 
cial for an efficient management and storage of the information. 

In possibility theory there are two different ways to define the counterpart 
of Bayesian networks. This is due to the existence of two definitions of possi- 
bilistic conditioning : product-based and min-based conditioning. When we use 
the product form of conditioning, we get a possibilistic network close to the 
probabilistic one sharing the same features and having the same theoretical and 
practical results [1] [5] [6] which is not the case with min-based networks. 

In this paper we focus on min-based possibilistic directed graphs and propose 
a possibilistic inference algorithm which allows to determine the possibility de- 
gree of any variable of interest given some evidence. Our aim is to avoid a 
direct adaptation of probabilistic propagation algorithms [7] [9] and especially 
the transformation of the initial network into a junction tree which is known to 
be a hard problem [2] . 

The proposed propagation algorithm works in two steps. First, the initial net- 
work is converted into a moral graph where cycles are easily handled, in the 
propagation algorithm, due to the idempotency of the minimum operator. Then, 



S. Benferhat and P. Besnard (Eds.): ECSQARU 2001, LNAI 2143, pp. 266-277, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 




A Two-Steps Algorithm for Min-Based Possibilistic Causal Networks 



267 



possibility degrees of variables of interest are computed via a message passes. 
More precisely, we first stabilize the moral graph then check its consistency. 

Section 2 gives a brief background on possibilistic theory. Section 3 intro- 
duces the notions of a-normalized min-based directed possibilistic graphs and 
a-normalized possibilistic moral graphs. Section 4 develops the propagation al- 
gorithm when no evidence is available. Lastly, Section 5 considers the case of 
integrating the evidence. 



2 Background on Possibility Theory 

Let V = {Ai, A 2 , ..., Aat} be a set of variables. We denote by Da = {ai,..,a„} 
the domain associated with the variable A. By a we denote any instance of 
A. f2 = XAiGvh^Ai denotes the universe of discourse, which is the Cartesian 
product of all variable domains in V. Each element of w G C is called a state of 
17. w[A] denotes the instance in oj of the variable A. 

In the following, we only give a very brief recalling on possibility theory (for 
more details see [3]). 

Possibility distribution and possibility measure: The basic concept 
in the possibility theory is the notion of possibility distribution denoted by tt 
which is a mapping from 17 to the unit interval [0, 1]. Possibility distribution 
aims to encode our knowledge about ill-known world: 7r(u;) = 1 means that uj 
is completely possible and 7r(w) = 0 means that lo is impossible to be the real 
world. Given a possibility distribution tt defined on the universe of discourse 17, 
we can define a mapping grading the possibility measure of an event (f> C f2 by: 

n{(f>) = max tt{lo). (1) 

Definition 1 A possibility distribution tt is said to be a-normalized, if its nor- 
malization degree h{Tr) is equal to a, namely: 

a = h{Tr) = maxTT{ijj) ( 2 ) 

U) 

If a = 1, then tt is simply said to be normalized. 



Possibilistic conditioning: In the possibilistic setting conditioning consists 
in modifying our initial knowledge, encoded by the possibility distribution tt, by 
the arrival of a new sure piece of information 4> C [2. The initial distribution 
TT is then transformed into a new one denoted by tt = tt{. \ 4>). In possibility 
theory there are two possible definitions of conditioning one based on the product 
and the other on the minimum. In this paper, given a normalized possibility 
distribution tt, we use min-based conditioning proposed in [8] [4] and defined by: 



n{TT I (j)) 



U{tt A <j)) if U{tt A (f>) < n{<p) 
1 otherwise 



( 3 ) 



When TT is subnormalized, the degree 1 in (3) is replaced by a = h{Tr). 
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3 CK-Normalized Possibilistic Graphs and a-Normalized 
Moral Graphs 

We first need to introduce the notions of a-normalized possibilistic graphs and 
possibilistic moral graphs which will be used later. 



3.1 a-Normalized Min-Based Directed Possibilistic Graphs 

An a-normalized min-based directed possibilistic graph over a set of variables 
V, denoted by UG, is a DAG (Directed Acyclic Graph) where nodes represent 
variables and edges encode the link between the variables. Parents of a node A 
is denoted by Ua- For every root node A (Ua = 0 ), uncertainty is represented 
by the a priori possibility degree II{a) of each instance a G Da, such that 
maxa 17(a) = a. For the rest of the nodes, {Ua ^ 0 ), uncertainty is represented 
by the conditional possibility degree II {a \ ua) of each instances a £ Da and 
Ua G Du a- These conditional distributions satisfy the following normalization 
condition: maxall{a \ ua) = ce, for any ua- When a = 1, we recover classical 
possibilistic networks. 

Given all the a priori and conditional possibilities, the joint distribution rel- 
ative to the set V, denoted by tt-d, is expressed by the following min-based chain 
rule: 

7 Tx,(Ai, .., Aat) = min 77(Aj | (4) 

i=l..N 

An important property of a-normalized graphs is: 



Proposition 1 Let IIG be an a-normalized min-based possibilistic graph. Let 
IT'D be the joint distribution computed from (j). Then, ttx> is a-normalized (in 
the sense of Definition 1), namely h{TTx>) = a. 



Example 1 Let us consider the normalized min-based possibilistic graph IIG 
composed by the DAG of Figure 1 and the following initial distributions: 






bi 

62 



77(7? I 73, G) 



bi A Cl bi A C2 

di f 1 1 

d2 1 0.8 



ai 


02 




Ol 


0.3 

1 


1 

0.2 


y 77(G| A)= 1 


i^O .4 


62 A 


Cl 


62 A C2 





1 1 
0 1 



02 

0 

1 



These a priori and conditional possibilities encode the joint distribution rel- 
ative to A,B,G and D using (j) as follows: ya,b, c, d,TTx>{a A b A c A d) = 
min{n{a),n{b \ a),II{c \ d),II{d \ b A c). For instance TTx>{ai A 62 A C 2 A di) = 
min{l, 1,0.4, 1) = 0.4. Moreover we can check that h{TTj)) = 1. 
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Fig. 1. Example of a Multiply Connected DAG 



3.2 a-Normalized Possibilistic Moral Graphs 

A possibilistic moral graph over a set of variables V, denoted by AiQ is a graph- 
ical representation where nodes are set of variables called clusters and denoted 
by Ci (in Section 4, each cluster is associated with a variable and its parents). 
Each edge in Aft/ is labeled with the intersection of the adjacent clusters Ci and 
Cj called separator and denoted by S.j- We denote by Ci and Sij the possible 
instances of the cluster Ci and the separator Sij respectively. Ci[A] denotes the 
instance in Ci of the variable A. 

For each cluster Ci (resp. separator Sij) of MQ, we assign a local joint 
distribution relative to the variables in the cluster (resp. separator), called 
potential and denoted by ttc- (resp. 

The joint distribution associated with MQ, denoted t^mg is expressed by: 
t^mg{Ai,-,An) = min 7TC, (5) 

1—1..N 



which is similar to the chain rule (4). 

We now give some definitions regarding moral graphs: 

Definition 2 Let Ci and Cj he two adjacent clusters in a moral graph MQ and 
let Sij be their separator. The separator Sij is said to be stable if: 

max TTn, = max ttc, (6) 

CASij CASii ^ 

where maxp^yg^ ttc^ is the marginal distribution of Sij defined from irCi- 

A moral graph MQ is said to be stable if all of its separators are stable. 

The following proposition shows that if a moral graph is stable, then the 
maximum value of all cluster’s potentials is the same. 

Proposition 2 Let MQ be a stabilized moral graph. Then 'iCi, a = maxTrc^. 
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In the following, a stabilized moral graph will be said to be a-normalized 
moral graph to make explicit the degree a. As we will see later, a-normalized 
moral graphs do not imply that h^rrMc) = ck- When this equality holds, we talk 
about consistent moral graphs. 

Definition 3 Let A4G be a stabilized a-normalized moral graph and let ttmg be 
its joint distribution obtained by (5). A4G is said to be consistent if a = h(jr_\ 4 g). 

4 Possibilistic Propagation 

4.1 Basic Ideas 

Given a normalized possibilistic graph LTG, our aim is to compute for any in- 
stance a of a variable of interest A the possibility distribution 7Tx)(a) inferred 
from ITG. To compute IIxi{a), we first define a new possibility distribution tTo 
from TTx) as follows: 



Then, from TTa, 



i(w) = 



0 



it can be checked that: 



if oj[A\ = a 
otherwise 



n-nia) = h{-Ka) = max TTo (w). 

<jJ 

Note that, in general, tt^ is subnormalized i.e h{T:a) < 1- 



(7) 

( 8 ) 



The principle of the proposed propagation method is summarized in Figure 
2 which explains how to compute in a local manner, the possibility distribution 
Uxi{a). Note that computing Uxi{a) corresponds to possibilistic inference with 
no evidence. The more general problem of computing ilx)(a | e) where e is the 
total evidence is advocated in Section 5. 

The basic steps of the propagation algorithm are: 

— Initialization. Transform the initial normalized graph into a moral graph, 
by marrying parents of each node. Then quantify the moral graph using the 
initial conditional distributions of the DAG. Lastly, incorporate the instance 
a of the variable of interest A. The resulting moral graph is, in general, 
neither stable nor consistent. 

— Stabilizing the Moral Graph. Reach the stability of the moral graph by prop- 
agating potentials. 

— Checking and Recovering Consistency. One way to check the consistency 
of an a-normalized moral graph is to construct its equivalent a-DAG. We 
proceed iteratively by adding successively new links to the moral graph and 
then by stabilizing it again until reaching the consistency. 

— Computing IIxi (a) . From a consistent a-normalized moral graph we can com- 
pute the possibility degree IIxi{a) by simply taking IIxi{a) = a. 

In the following, we denote by ttq. the potential of the cluster Ci at a step 
t of the propagation, t = 1 (resp. t = c) corresponds to the initialization (resp. 
consistency) step. 
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Normalized Instance of 

Possibilistic Graph Interest a 




IlD(a) 



Fig. 2. Propagation Algorithm 



4.2 Initialization 

The first step of the propagation procedure is to transform the normalized pos- 
sibilistic graph into a moral graph and to quantify it by transforming the initial 
conditional distributions into local ones. Once the transformation is done, the 
initialization procedure incorporates the instance Ui of the variable of interest Ai 
into the moral graph by considering that Ai = ai. The outline of the initialization 
procedure is as follows: 



1. Building the moral graph: 

1.0. For each variable Aj &V: 

Form a cluster, denoted by C,-, containing {AAvMJa - 

1.1. ^},^^{Ai^UA,)^^{A,\UAi) 

1.2. Between any two clusters with a non-empty intersection, add a link with 
their intersection as a separator. Potentials associated with separators are 
equal to 1. 

2. Incorporating the instance ai of the variable of interest Ai\ 






7r^.(cj) iici[Ai\=ai 
0 otherwise 



Proposition 3 Let UG be a min-based directed possibilistic graph. Let A4Q be 
the moral graph corresponding to LI G given by the initialization procedure. 

Let TTa be the joint distribution given by (5) (which is obtained after incor- 
porating the instance a of the variable of interest A). Let be the joint 

distribution encoded by A4G (given by (5)). Then tt^ = 
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Example 2 Let us consider the II G given in Example 1 . The moral graph cor- 
responding to IIG is represented in Figure 3 . Suppose that we are interested 
with the value of IIx>{D = ^2), then after the initialization step we obtain the 
potentials given in Table 1 . We can check that the initialized moral graph is not 
stable. For instance, the separator A between AB and AG is not stable since 
max^B\A 7^^13(02) = 0.9 ^ max^^V A 77^0(02) = 1 - 



Table 1. Initialized potentials 



a 


1 

A 


a 


b 


1 

AB 


a 


c 


1 

AC 


b 


c 


d 


1 

BCD 


b 


c 


d 


1 

BCD 


ai 


1 


ai 


bi 


0.3 


ai 


Cl 


1 


bi 


Cl 


d\ 


0 


b2 


Cl 


df 


0 


a2 


0.9 


ai 


62 


1 


ai 


C2 


0.4 


bi 


Cl 


d2 


1 


h2 


Cl 


d2 


0 








bi 


0.9 


a2 


Cl 


0 


bi 


C2 


d\ 


0 


h2 


C2 


df 


0 






(22 


62 


0.2 


(22 


C2 


1 


bi 


C2 


d2 


0.8 


h2 


C2 


d2 


1 




^ Clusters | | Separators 



Fig. 3 . Moral Graph of the DAG in Figure 1 



4.3 Stabilizing the Moral Graph 

The stabilizing procedure is performed via a message passing mechanism 
between different clusters where each separator collects information from its 
corresponding clusters, then diffuses it to each of them in order to update them 
by taking the minimum between their initial potential and the one diffused by 
their separator. This operation is repeated until there is no modification on the 
cluster’s potentials. 

At each level when AiG is not stable, the potentials of any adjacent clusters 
Gi and Gj with separator Sij are updated as follows: 
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Collect evidence (Update separators): 




mini max max Wr.) 

Oij V Q, CZi ' ^ \ C ' 


(9) 


Distribute evidence (Update clusters): 




^ ’rniniTT*c.,Trl+^) 


(10) 


^ win(7r^.,7rs+^) 


(11) 



The outline of the stabilizing procedure is as follows: 



1. While the moral graph is not stable repeat step 2 

2. For each separator Sij 

2.0. Collect evidence in Sij from Ci and Cj using (9). 

2.1. Distribute evidence from Sij to Cj and Cj using (10) and (11). 

At each level of the stabilizing procedure (step 2) the moral graph encodes 
the same joint distribution: 

Proposition 4 Let and be the joint distributions computed, respec- 

tively, at level t and t-\- 1. Then 

It can be shown that the stability is reached after a finite number of 
message passes of step 2 in the stabilizing procedure, and hence the stabilization 
procedure is polynomial. 

From Propositions 3 and 4 we deduce that from the initialization to the 
stability level (s), the moral graph encodes the same joint distribution. More 
formally, at any level t € {1, .., s} we have: 

TTa = TT%iG (12) 

Example 3 Let us consider the moral graph initialized in Example 2. At sta- 
bility level, reached after two message passes, we obtain the potentials given in 
Table 2. Note that, the maximum potential is the same in the four clusters i.e 
maxTTA = maxTTAB = maxiTAC = "maxiTBCD = 0.9. 



Table 2. Stabilized potentials (t=s) 



a 


s 

A 


a 


b 


S 

AB 


a 


c 


s 

AC 


b 


c 


d 


BCD 


b 


c 


d 


BCD 


ai 


0.9 


ai 


bi 


0.3 


ax 


Cl 


0.9 


bi 


Cl 


di 


0 


b2 


Cl 


dx 


0 


a2 


0.9 


ai 


b2 


0.9 


ax 


C2 


0.4 


fei 


Cl 


d2 


0.9 


b2 


Cl 


d2 


0 








bi 


0.9 


tt2 


Cl 


0 


bi 


C2 


di 


0 


b2 


C2 


dx 


0 






«2 


b2 


0.2 


a2 


C2 


0.9 


6i 


C2 


d2 


0.8 


b2 


C2 


d2 


0.9 



In general, stability does not guarantee consistency. Indeed, it can be checked 
in our example that h{Tr*^g) = 0.8 yf 0.9. 
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4.4 Checking and Recovering Consistency 

Given a stabilized a-normalized moral graph MQ, our aim is to check the con- 
sistency of j\4Q in the sense of definition 3. First, we need a further definition: 

Definition 4 A cluster Ci relative to the variable Ai is said to be consistent if 
for any instance ua^ of UAi •' 

max 7T^. {oi A UAi )= a 

ai * 

Now, we provide a practical way to check the consistency of a moral graph. 

Proposition 5 A moral graph MQ is consistent if all its clusters are consistent. 

The proof of this proposition is based on Proposition 1 and the following 
technical lemma: 

Lemma 1. Let AAQ be a stabilized a-normalized moral graph and let t^mq 
its joint distribution. If all the clusters of AAQ are consistent, then there exists 
an a-DAG G' such that its joint distribution satisfies ttmo = 

Case of consistency. In the case where the a-normalized moral graph is con- 
sistent, the computation of a is immediate with the help of Proposition 1 and 
Lemma 1. Namely, we can derive the following corollary: 

Corollary 1 Let AAQ be a consistent a-normalized moral graph and let Oi be 
any instance of the variable of interest Ai. Then, 

nx>{a) = maxTTQ. = a. 

In other terms, we can compute the possibility degree IIxi{a) from the con- 
sistent a-normalized moral graph by simply taking the maximum potential of 
any cluster. 



Case of inconsistency. If there exists a variable Ai where s.t 
maxa^ 7T^.(ai A uaJ = (3 < a then the moral graph is not yet consistent. In this 
case, we should drop the inconsistency from Ci by replacing for any instance UAi 
s.t maxa^ TTQ.{ai A = P < a, the potential P by a. However, we should not 
lose this degree. 

Thus, the idea is to check if the parent variables of Ai are linked i.e it exists 
a cluster which contains UAi- i® the case, the potential of this cluster is 
modified by incorporating the degree p. 

If such cluster does not exist, we should create new links between variables 
in UAi- More precisely, we select any of the parents of Ai and we add to its 
parent set the remaining variables in UAi- Then, when quantifying these new 
links we can incorporate the degree p. The modifications of the moral graph are 
summarized as follows: 
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1. Drop the inconsistency : 

Let Ci be an inconsistent cluster. Let X be the set of all the instances UAi 
s.t maxa^ TTp. (oi A = (3 < a. 

For any instance UAi in X, replace the potential P hy a 

2 . Add new links between parents (If ^ a cluster containing UaJ' 

Let Aj be any of the parents of Ap 

- Add to Cj the variable set T = [/^. \ {Aj} as additional parents of Aj i.e 
Ua, ^Ua, at 

- Update the separators associated with Cj (since the intersection between 
the other clusters and Cj is modified) 

3. Modify the parent cluster potential: 

Let Cj be the cluster containing Uai'- 

^ ’ ^new) 



where TTnewiuAi) = 



if UAi G ^ 
otherwise 



Proposition 6 Let AAQ be a stable moral graph and AAQ’ be the new moral 



graph obtained as result of the procedure above. Let be the joint distribu- 

tion encoded by AAQ and be the joint distribution encoded by A4Q’. Then, 






t+i 

Ma- 



li step 3 in the modification procedure updates any of the parent cluster 
potentials, then we should restabilize the moral graph again and recheck the 
consistency. The algorithm stops when consistency is reached. 



Example 4 Let us consider the moral graph stabilized in Example 3. We can 
check that this moral graph is inconsistent since maxirAB = 0-9 while = 

0.8. This is due to the cluster BCD corresponding to the variable D since 
max{ng(jjj{di Abi A C 2 ),'K%cjj{d 2 A 6 i A C 2 )) = 0.8 < 0.9 and 
max{TTgQjy{di A 62 A ci), 7r^(^^((i2 A 62 A ci)) = 0 < 0.9. 

Thus we should modify the potential of BCD and modify for instance the cluster 
AB by considering C as a new parent of B. This entails a modification of the 
moral graph by replacing the cluster AB by ABC and adding the corresponding 
separators. 

The new clusters’s potentials after the modification and the restabilization 
are given in Table 3. Note that we get a 0.8-normalized moral graph, thus the 
possibility measure IIxi{d 2 ) corresponds to the maximum potential in clusters i.e 

nT>{d2) = 0 . 8 . 



5 Handling the Evidence 

The proposed propagation algorithm can be easily extended in order to take into 
account new evidence e which corresponds to a set of instanciated variables. The 
computation of iTx)(a | e) is performed via two calls of the above propagation 
algorithm in order to compute successively T[x>{e) and LIxi{a A e). Then using 
the min-based conditioning, we get: 
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Table 3 . Consistent potentials 
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iTx)(aAe) if 77x)(a A e) < 7Tx)(e) 
1 otherwise 



The computation of Uxi{e) needs a slight transformation on the initialization 
procedure since the evidence can be obtained on several variables. More precisely, 
step 2 of Section 4.2 is replaced by: 

Incorporating the instance aiA, .., Aom of the variables of interest Ai, A m, i.e: 



Vz G {1, .., M}, 7Tp^(ci) ^ 



,(c.) 



0 



if Ci[Aj\ = ai 
otherwise 



Example 5 Let us consider the II G given in Example 1. Suppose that we are 
interested with the value of nx>{ai \ ^ 2 )- In other terms, we want to compute the 
impact of the evidence D = d 2 on the instance Oi of the variable A. Then we 
should first compute II'D{d 2 ) then A <^ 2 ). The value II'D{d 2 ) was already 

computed in Example 4, and is equal to 0.8. Then we will integrate A = a± in 
the consistent moral graph and apply again the propagation procedure. The new 
consistent potentials are given in Table 4- From these potentials we deduce that 
LIv{ai A ^ 2 ) = 0.4, thus i7x>(ai | ^ 2 ) = 0.4 since IIx'{ai A ^ 2 ) < IIv{d 2 )- 



Table 4. Consistent potentials 
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6 Conclusion 

This paper has proposed an algorithm for computing the possibility degree of a 
variable of interest given some evidence e in min-based possibilistic graphs. Our 
algorithm is mainly based on two procedures: stabilization and checking consis- 
tency. These procedures are both polynomial. The weakness of our algorithm 
is that in a case of inconsistencies, some clusters are enlarged with additional 
variables. However, the maximum number of added variables, due to an inconsis- 
tent cluster, does not exceed the maximal cardinality of parents of the variable 
associated with the inconsistent cluster. 

The algorithm proposed in this paper can be directly applied for revising a min- 
based possibilistic graph by integrating a new piece of knowledge (and not simply 
an evidence or observation). Namely, our algorithm can be used to construct a 
new DAG taking into account this new knowledge. 

A further work will be to experimentally compare the proposed algorithm with 
a direct adaptation of the probabilistic propagation algorithms. 
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Abstract. The paper shows a method to compute a posteriori intervals 
of probabilities when the initial conditional information is given also with 
intervals of probabilities. The right way of doing an exact computation 
is with the associated convex set of probabilities. Probability trees are 
used to represent these initial conditional convex sets, because they save 
enormously the required space. This paper proposes a simulated anneal- 
ing algorithm, using probability trees as the representation of the convex 
sets, in order to compute the a posteriori intervals. 



1 Introduction 

Bayesian networks are graphical structures which are used to represent efficiently 
joint probability distributions, by taking advantage of the independence rela- 
tionships [18] among the variables. Different problems can be solved using this 
graphical representation of the joint distribution. One of the most common tasks 
is the computation of posterior marginals given that the value of some of the 
variables is known. This task is called probability propagation. 

One of the main problems faced when building Bayesian networks is the 
introduction of a large number of probabilities. Very often, an expert is more 
comfortable giving an interval of probability rather than a precise probability. 
Even if we use a learning algorithm to obtain probabilities, we may only have 
small samples for certain configurations of variables in a distribution. Therefore, 
it may also be more appropriate to estimate some kind of imprecise probabilities 
in this case. 

In general, the use of imprecise probability models can be useful in many 
situations [26] . There are various mathematical models for imprecise probabilities 
[25,26]. Out of all these models, we think that convex sets of probabilities is the 
most suitable model for calculating with and representing imprecise probabilities, 
because there is a specific interpretation of numeric values [25] . They are powerful 
enough to represent the result of basic operations within the model without 
having to make approximations that cause loss of information, as in interval 
probabilities. Convex sets are a more general tool for representing unknown 
probabilities than intervals: there is always a convex set associated with a system 
of probabilistic intervals, but given a convex set there is not always a proper 
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representation by using intervals. However, interval probabilities are the most 
natural way in which imprecise probabilities are given in practice. In this paper, 
therefore, we will assume that initial probability distributions are given with 
interval of probabilities, but computations are carried out by considering their 
associated convex sets. 

Some authors have considered the propagation of probabilistic intervals in 
graphical structures [1,12,23,24]. However in the proposed procedures, there is 
no guarantee that the calculated intervals are always the same as those obtained 
by using a global computation. In general, it can be said that calculated bounds 
are wider than exact ones. The problem is that exact bounds need a computation 
with the associated convex sets of probabilities. This is the approach followed 
by Cano, Moral and Verdegay-Lopez [10]. They assume that there is a convex 
set of conditional probabilities for each configuration of parent variables in the 
dependence graph. They give a model to obtain the exact bounds using local 
computations. However, working with convex sets may be very inefficient: if we 
have n variables and each variable, X^, has a convex set with li extreme points as 
conditional information, the propagation algorithm is of the order 0{K. YYi^i h), 
where K is the complexity of carrying out a simple probabilistic propagation. 
This is so, because convex sets propagation is equivalent to the propagation of 
all the global probabilities that can be obtained by choosing an exact conditional 
probability in each of the convex sets. 

Probability trees [9,20] can be used to represent probability potentials. The 
authors have used probability trees to propagate efficiently in Bayesian networks 
using a join tree when resources (memory and time) are limited, obtaining a 
greater or smaller error depending on the available time. Probability trees [8] 
can also be applied in order to propagate the convex sets associated to the 
interval of probabilities improving computing time, obtaining exact or approx- 
imated intervals, depending on the available computing time. Another solution 
to the problem of propagating the convex sets associated to intervals, is by using 
combinatorial optimization techniques such as simulated annealing [6], genetic 
algorithms [7], and gradient techniques [11]. 

In this paper, we propose adapting a simulated annealing algorithm in order 
to use probability trees. The rest of the paper is organized as follows. In section 

2 we describe the basics of probability propagation in Bayesian networks; section 

3 present basic notions about convex sets of probabilities and their relationships 
with probability intervals; section 4 studies probability trees as a tool to represent 
potentials in a compact way and also how they can be applied to represent 
convex sets of probabilities; section 5 describes the proposed simulated annealing 
algorithm; in section 6 we show some experimental work and finally section 7 
gives some conclusions. 



2 Probability Propagation in Bayesian Networks 

Let X = {Xi, . . . ,X„} be a set of variables. Let us assume that each variable 
Xi takes values on a finite set Ui. For any set U, \U\ represents the number 
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of elements it contains. If / is a set of indices, we will write Xj for the set 
{Xi\i & I}. Let be N = n} the set of all the indices. The Cartesian 

product rijG/ Ui will be denoted by Uj. Given x G U[ and J Q I, xj will denote 
the element of Uj obtained from x dropping the coordinates which are not in 
J. In Shenoy and Shafer’s [21] terminology, a mapping from a set Uj on [0, 1] 
will be called a valuation h for Xj. Over valuations, Shenoy and Shafer define 
the operations of combination hi O /12 (multiplication) and marginalization 
(adding in the removed variables). They give an abstract set of axioms to operate 
with valuations. 

A Bayesian network is a directed acyclic graph where each node represents 
a random variable Xi, and the topology of the graph shows the independence 
relations between variables, according to the d-separation criterion [18]. Also, 
each node Xi has attached a conditional probability distribution pi{Xi\F{Xi)) 
for the variable given its parents F(Xi). Following Shafer and Shenoy termi- 
nology, these conditional distributions can be considered as valuations. In this 
way, taking into account the independence relations expressed by the graph, the 
Bayesian network determines an unique joint probability distribution: 

P(x) = W_Pi{xi\xF{Xi)) yx GUn ■ ( 1 ) 

i£N 

An observation is the knowledge of the exact value Xi = of a variable. 
The set of observations will be denoted by e, and called the evidence set. E will 
be the set of indices of the observed variables. Every observation, Xi = Ci, is 
represented by means of a valuation which is a Dirac function defined on Ui as 
di{xi\ Ci) = 1 if Ci = Xi, Xi G Ui, and 6i{xi', Ci) = 0 if e* yf x*. 

The aim of probability propagation is to calculate the a posteriori probability 
function p{xk\e), for every Xk G Uk, where k G {1, . . . ,n} — E. Given an evidence 
set e, p{xk\e) oc Y.Xj.Xj^Xu (n ^Pi{xi\xF(Xi))Y\ei^e^^i.^i'^^i))- In f&ct, previous 
formula is the expression for p{xkC\e). The vector of values {p{xk H e)), Xk G Uk, 
will be denoted as Rk- 

A known propagation algorithm can be constructed transforming the directed 
acyclic graph in a tree of cliques. Basically there are two different schemes [17,21] 
to propagate on a tree of cliques. We will follow Shafer and Shenoy scheme [21] 
because we do not have to do divisions between valuations, which is an operation 
not defined for convex sets of probabilities. Every clique has attached a valuation 
Fci initially set to the identity mapping. There are two messages (valuations) 
Mci^Cj, MCj^Ci between every two adjacent cliques Ci and Cj. Mci^Cj is a 
message that sends Ci to Cj and Mc^^Ci a message that sends Cj to Cj. Every 
conditional distribution pi and every observation Si are assigned to one clique 
Ci that contains all its variables. If hi {pi or Si) is a valuation assigned to clique 
Ci then Fci is transformed in the following way: Fc, = Fci ® hi. The algorithm 
of propagation is carried out by traversing the tree of cliques from leaves to root 
and then from root to leaves updating messages in the following way: 

CkeAdy(Ci,Cj) 



(2) 
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where Ady{Ci, Cj) is the set of adjacent cliques to Ci except Cj. 

Once the propagation is done, the a posteriori distribution Rk for variable Xk 
can be calculated looking for a clique Ci containing and using the following 
expression: 

Rk = {^c,®{ (g) (3) 

CjeAdyiCi) 

where Ady{Ci) is the set of adjacent cliques to C,. 



3 Convex Sets of Probability Distributions 

With imprecise probabilities, a piece of information for variables in a set / will 
be a closed, convex set, H, of mappings p : Uj — 1- with a finite set of 
extreme points. Every mapping is given by the vector of values {p{x))x^Ui- 




Fig. 1. Propagated Convex Set, Rk 



The propagation of convex sets of probabilities [8,10] is completely analo- 
gous to the propagation of probabilities. The formulas are the same, but every 
valuation hi is now a convex set with h extreme points. The operations of com- 
bination and marginalization are the same as in the probabilistic case repeated 
for all the extreme points of the operated sets. The result of the propagation for 
a variable, Xk, will be a convex set of mappings from Uk in [0, 1]. For the sake of 
simplicity, let us assume that this variable has two values: x\,x\. The result of 
the propagation is a convex set on IR? of the form of Figure 1 and that will be 
called Rk- The points of this convex set, Rk, are obtained in the following way: 
if P is a global probability distribution, formed by selecting a fixed probability 
for each convex set, then associated to this probability, we shall obtain a point 
(^ 1 ,^ 2 ) G Rk where, = P{x\C\e), <2 = P{x\C\e), and e is the given evidence or 
family of observations. The projection of the point (p, on the line -|- ^2 = 1 
is equivalent to dividing by P{e) and gives rise to a normalized probability dis- 
tribution: P{x].\e) = ti/{ti +t 2 ),i = 1,2. So, the final intervals [a,b] associated 
to x\. can be calculated with formula 4: 
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a = + t2) I {h,t2) € Rk\ 

h = s\vp{ti/{ti + t 2 ) \ & Rk} (4) 

3.1 Convex Sets and Intervals 

As it was mentioned in the introduction, we are trying to solve the problem of 
propagation in dependence graphs where each initial conditional information is 
given by an interval. We can obtain the convex sets associated to the initial infor- 
mations (interval of probabilities), and then do computations using these convex 
sets and finally obtain the associated a posteriori intervals. The extreme points 
of the convex set associated to a set of probability intervals can be calculated in 
an efficient way with the algorithm presented in [4] . 

Suppose Xj is a set of random variables taking values on the finite set Uj 
and Y a random variable taking values on a finite set V. Then, if we have 
a conditional distribution P(Y\Xi) given with probability intervals, we must 
apply the algorithm of Campos et al. [4] for each xj G Uj to obtain the global 
convex set That is to say, if Extx, is the set of extreme points of the 

convex set associated to the distribution P{Y\X = xj), then the global convex 
set can be obtained with the following Cartesian product: 

n Extx! (5) 

xj^Ui 

That it, a conditional extreme probability for Y given Xj is composed of an 
extreme probability conditioned to Xj = xi for each possible value xj. 



4 Probability Trees 

Probability trees have been demonstrated to be useful to represent probability 
distributions in order to obtain more compact representations than tables. The 
size of a table is exponential in the number of parameters (number of vari- 
ables in the distribution), while the size of a probability tree can be much 
smaller when there are regularities (asymmetrical independences) in the prob- 
ability distribution. A probability tree T [3,5,9,13,16,19,20,22,27] for a set of 
variables Xj is a directed labeled tree, where each internal node represents a 
variable Xi G X/ and each leaf node represents a real number r G M. Each 
internal node will have as many outgoing arcs as possible values the vari- 
able it represents has. Let Xj, Xj, X^ and Xx be four disjoint sets of vari- 
ables. Xi and Xj are independent given Xl in context Xk = xk, noted as 
R{Xi-Xj\Xl\Xk = xk), if P{Xi\Xl,Xj,Xk = xk) = P{Xi\Xl,Xk = xk) 
whenever P{Xj, Xl,Xk = xk) > 0. When Xl'\s empty, it can be said that Xj 
and Xj are independent in context Xk = xk- 

Cano and Moral [5] present a methodology to build a probability tree from 
a probability table. They also propose a way of approximating potentials with 
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probability trees of a given limited size. Also they give exact and approximated 
methods to calculate with probability trees. They show how to marginalize a 
probability tree to a set of variables, combine two probability trees and restrict 
a probability tree to a given configuration of variables. 

4.1 Using Probability Trees in the Propagation of Probability 
Intervals 

Probability trees are specially useful when dealing with the problem of propa- 
gating probability intervals. As we mentioned in section 3.1 a set of intervals is 
transformed into a set of extreme points to do the computations. Beside this, 
we transform the problem into an equivalent one. For each variable Xi, we origi- 
nally give a valuation hi for Xi conditioned to its parents F{Xi). This valuation 
is a convex set of I conditional probability distributions, hi = {pi, . . . ,pi}. To 
implement propagation algorithms we add to the domain of hi a new variable, 
Tj, taking values on the set {ti, . . . , ti}. This variable is made a parent node of 
Xi in the dependence graph. On this node we consider that all the probability 
distributions are possible, that is to say, the valuation for Ti is a convex set with 
I extreme points, each one degenerated in one of the possible cases of Ti. Now, 
the probability of Xi given its parents is an unique, determined probability dis- 
tribution. Every extreme point pi of the conditional convex set can 

be recovered by fixing Ti to one of its possible values {ti, . . . ,b} in valuation 
hi- We can verify that the structure of the problem does not change with this 
transformation. The only thing that has happened is that our lack of knowledge 
about the conditional probabilities is now explicit with the help of an additional 
node expressing all the possible conditional probability distributions. Nothing is 
known about this node. The idea is to keep the different values of T as parame- 
ters in the problem. 

The previous way of representing conditional convex sets has a high required 
memory space that can be improved with probability trees (see Cano and Moral 
[8] for more details). Suppose we want to represent the global conditional convex 
set and we know which are the extreme points of each one of the convex 

sets }jY\Xi=xi ^ This requires a set of extreme points given by expression 5. In 
general this leads us to a high cost representation. In [8] the authors use a trans- 
parent variable T^j for each xj € Uj. T^j will have as many cases as the number 
of extreme points which }jY\Xi=xi reduction in the representation is 

obtained by taking asymmetric independences among these transparent variables 
into account. It is obvious that Ic{Y-,Txj\X = xj,xj yf xj) : Va;/ G Uj. Given 
Xj = xj, Y does not depend on T^j for x/ yf xj. A compact probability tree can 
then be built if wet put the variables of Xj in the upper levels of the tree, then 
we put the transparent variables T^^ , and finally we put variable Y in the lower 
level. With this probability tree, a point of the global convex set can be 

found by fixing all transparent nodes T^j to one of its values. This corresponds 
with the operation of restriction (see for example [20]) in probability trees. In 
figure 2 we can see an example where a probability tree represents the global 
information associated to the two convex sets and hY\x=x- 2 ^ 
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The figure also shows the global information by means of a table. In the 

probability tree of figure 2, we get the extreme points fixing and to one of 
its values. For example restricting the probability tree to = t\ and 
we get a new probability tree that gives us the extreme point r 2 . 



<’^l.yi> (x^,y,) (x^.yj 
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rj .2 .8 .6 .4 

r, .3 .7 .4 .6 

.3 .7 .6 .4 



Fig. 2. A probability tree for 
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Let St be the set of all transparent variables on the tree of cliques. Using 
probability trees as valuations, the result Rk of the propagation algorithm de- 
scribed in section 2 for a variable A'fe, will be represented with a valuation hk 
(a probability tree) containing variable Xk and all variables in St- Selecting the 
different values for variables in St we can obtain the extreme points of the a 
posteriori convex set for Xk- Then, applying expression 4 we will obtain the in- 
tervals for the cases of Xk- This method is equivalent to do the following number 
of probabilistic propagations: n|r*| : RgSt- 

5 Propagating Convex Sets with Simulated Annealing 
and Probability Trees 

The use of probability trees to represent convex sets of probabilities can reduce 
significantly the required space to maintain the convex sets of probabilities. But, 
the number of probabilistic propagations required to obtain a posteriori infor- 
mation is not reduced. Because of the big number of extreme points associated 
to each conditional information, the problem of calculating the a posteriori in- 
formation could be unfeasible as we pointed in the introduction. In that case, a 
simulated annealing algorithm could be applied obtaining approximated results. 

5.1 Simulated Annealing Algorithm 

Simulated annealing [15] is an optimization technique to solve combinatorial 
optimization problems. Assume a cost function C : 5 — >■ defined on the 

search space S (a set of n variables). Our purpose is to find a configuration s € S 
(a configuration of the n variables) that minimizes (or maximizes) the function C- 
This algorithm presupposes a generation mechanism to go from a configuration Si 
to another one Si+i by a small perturbation. The set of configurations that can be 
reached from a configuration s is called the neighbourhood of s, A/”(s). Simulated 
annealing is similar to hill climbing, but sometimes it accepts a configuration 
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with higher cost. In this way it avoids being trapped at a local minimum. The 
possibility of going to a configuration of higher cost, depends on a parameter, 
t, called temperature. Initially t is high and the possibility of a cost increase 
is high. Like hill climbing, simulated annealing algorithm starts on an random 
configuration. When applying the generation mechanism to configuration Si to 
obtain Sj+i, if C{si+i) < C{si) then Si+i is accepted. Otherwise, is only accepted 



with probability Pr[Si — >■ Si+i) = e* * ‘ where t denotes the current 

temperature. Under an appropriate cooling procedure this algorithm converges 
to the global minimum. 

A simple cooling procedure was introduced by Kirkpatrick, Gelatt and Vecchi 
[15]. With this procedure, at each step the temperature is decreased according to 
the formula: = a.ti, where a is a given constant. This will be the procedure 

used in this paper. Another modification that can improve the efficiency was 
proposed by Green and Supowit, [14]. According to it, if we can calculate the cost 
of all neighbouring configurations of N(si), then instead of randomly choosing a 
configuration of N{si) and accepting it according to above procedure, it is better 
to choose a configuration, Sj+i, from N{si), with a probability proportional to 
g-(C(si+i)-C(si))/t^ This method will be also used in our algorithm. 



5.2 Our Simulated Annealing Algorithm 

We are trying to obtain the intervals [a, b] in formula 4 for a given variable 
of interest Xk for all its cases x].. For each this can be solved by select- 
ing the configuration of transparent variables given rise to a minimum value 
for a = P(x^|e) (and the configuration for the maximum for b). We will use 
a simulated annealing algorithm to search those configurations of transparent 
variables. Now, the search space S is the set of transparent variables and a con- 
figuration s is a selection of one case for each one of the transparent variables. 
The algorithm starts obtaining the probability trees associated to each origi- 
nal conditional information as described in section 4.1. Then, a tree of 

cliques is built as described in section 2, but now we need a double system of 
messages. For each pair of connected cliques, Ci and Cj, there are two messages 
going from Ci to Cj, and and two messages going from Cj to 

Ci, and Messages are calculated as usual, according 

to formula 2. Messages are also calculated as usual but it is assumed 

that the observation Xk = x\ is added to the evidence e. Every probability tree 
% and every observation 5i is associated to one clique Ci that contains all its 
variables. Then, the valuation 'I'a is calculated for every clique Ci, and saved 
on a copy valuation . A random initial configuration sg is selected for the 
transparent variables, and the observation of these variables is appended to the 
evidence set e, and appended to the corresponding cliques modifying the valu- 
ations 'I'Ci- These valuations ^Ci are now probability trees with no transparent 
variables because all of them are observed and pruned in probability trees. After 
these initial steps, we carry out a probabilistic propagation, calculating the two 
types of messages, and we start the simulated annealing algorithm. To do that. 
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we traverse the tree of cliques N times (the desired number of runs) from the 
root in such a way that we always go from one clique to one of its neighbours. 
This can be done building a sequence of the cliques Ci, . . . ,Cn in such a way 
that Cl is the root clique, and given two consecutive cliques in the sequence, 
then they are neighbours in the tree of cliques. A clique Ci can appear several 
times in the previous sequence. Every time a clique Ci is visited we simulate its 
transparent variables and then we send messages and to the 

next clique Cj (a neighbour of Ci). The scheme of the algorithm is the following: 

1. For r = 1 to 

a) Let be C the root clique. 

b) For all transparent variables Tj on the clique C 

i. Discard the observation of Tj from the evidence set. 

ii. Calculate 7?^^ and (a posteriori information for Tj) using expression 
3 with the two system of messages. 

iii. Calculate the pointwise division v = R^. / Rip^ . 

iv. Select a new case Cj for Tj with a probability proportional to ^ 

V. Add the observation Tj = Cj to the evidence set. 

c) Let be Cj the next clique visited. 

d) Send messages Mq^q. and Mq^q. to Cj 

e) Let be C = Cj the next clique to visit 

f) If C is not the root clique, go to step lb 

g) Set t = a -t 

2. The output of the algorithm will be the configuration of transparent variables with 
the minimum R^.{cj) / R)p.{cj) so far. 

One point that must be clear in previous algorithm is how to do step l(b)i, 
that is, how to discard previous observation of the variable Tj that we are going to 
simulate. This is easy to do because we have a copy of the valuation Wci of the 

clique Ci in which the transparent variable is. In the copy no transparent variable 
is instantiated. Then we calculate the new Tci using if'p. and instantiating all 
transparent variables in Ci except Tj. Another point that must be clear in step 
l(b)ii is the meaning of and Rt^ Both are vector of numbers. Each value in 
R\.. contains the probability of evidence P{e) obtained in current configuration of 
transparent variables and each value in R^, contains the a posteriori probability 
Pix], n e). Therefore R^JR\<. is again a vector of values containing P{x\\e) = 
P{x\ n e)lP{e), that is, the target of our optimization algorithm. 

A good property of previous algorithm is that it obtains a new candidate for 
the minimum every time a transparent variable is going to be simulated. This 
makes that with only one path in the tree of cliques we examine a big amount 
of possible candidates. 

6 Experimental Work 

To evaluate our simulated annealing algorithm we have applied it to several 
Bayesian networks available on Internet: Boerlage92 [2], Boblo, Car Starts and 
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Alarm. These graphs can be found in the literature for the probabilistic case, 
i.e. at each node we have a conditional probability distribution. Propagation on 
these networks is not very difficult from a probabilistic point of view. We have 
transformed each probability p into a randomly chosen probability interval. This 
makes the problem of exact propagation very difficult to solve, since it amounts 
to making a tremendous number of probabilistic propagations: 9.007199 x lO^s in 
Boerlage92, 4.529848 x 10® in Boblo, 5.242888 x 10® in Car Starts and 1.713495 x 

1093 

in Alarm. To transform each probability p into an interval we use the 
following procedure: for each p we select an uniform number r from the interval 
[0, max{p, I — p, d}] with d < 1 being a given threshold (we have used d = 0.1). 
Then p is transformed into the interval [p — r,p + r]. This way of selecting the 
interval ensures that p — r>0. Moreover, when p = 0.0 or p = 1.0 we will obtain 
[0.0, 0.0] or [1.0, 1.0] respectively. 

Experiments have been carried out on an Intel Pentium II (400 MHz) com- 
puter with 384MB of RAM and the Linux RedHat operating system with kernel 
2.0.36. Algorithms have been implemented in C language. We have used an ini- 
tial temperature of to = 2.0 and a cooling factor of a = 0.9. Different numbers 
of iterations have carried out, getting the intervals for the first case of one of the 
variable of the network (the algorithm has been focused to optimize the lower 
limit of the first case of one of the variable of the network) . 

Here we only reproduced results for Boerlage92 network because of space lim- 
its. Similar results can be obtained for the other networks. Figure 3 shows the 
results obtained in three different instances of the problem. In situation (a) and 

(b) we apply the algorithm to get intervals for two different variables when there 
is not any observed variable. In situation (c) we apply the algorithm to another 
variable, but when one of the other variables is observed. Figures (a), (b) and 

(c) show intervals for each number of iterations (N) (the horizontal axis repre- 
sents the number of iterations, and the vertical axis represents the probability) . 
Figures (a) and (c) also show the exact intervals obtained with an exact method 
of propagation (we have used the variables elimination method [8] because it 
can exploit the d-separation criterion). In situations (a) and (c) the algorithm 
obtains an exact calculus of the lower interval with a few iterations. But we can 
see that the upper limit do not converge. This is because the simulated anneal- 
ing algorithm has been run in order to minimize the lower limits, and not to 
maximize the upper limits. In situation (b) we cannot assure that exact results 
are reached because we were not able to obtain results with an exact method 
of propagation due to the high complexity of the problem. But looking at the 
figure (b) it seems that the lower interval is stable after 50 iterations. 



7 Concluding Remarks 

This paper has shown an approximate algorithm to obtain the a posteriori inter- 
vals for a given variable when the a priori conditional information is also given 
with intervals, and we suppose the independence among variables are represented 
with an acyclic directed graph as in a Bayesian network. The method uses prob- 
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Fig. 3. Intervals for each number of iterations in Boerlage92 network 



ability trees in order to reduce the size of the representation of the associated 
convex set, and it applies a simulated annealing algorithm to obtain approximate 
intervals. Experiments show that optimization techniques are promising in the 
propagation of intervals, when exact computations are unfeasible. 
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Abstract. We study probabilistic logic under the viewpoint of the coherence prin- 
ciple of de Finetti. In detail, we explore the relationship between coherence-based 
and model-theoretic probabilistic logic. Interestingly, we show that the notions 
of g-coherence and of g-coherent entailment can be expressed by combining no- 
tions in model-theoretic probabilistic logic with concepts from default reasoning. 
Crucially, we even show that probabilistic reasoning under coherence is a proba- 
bilistic generalization of default reasoning in system P. That is, we provide a new 
probabilistic semantics for system P, which is neither based on infinitesimal prob- 
abilities nor on atomic-bound (or also big-stepped) probabilities. These results 
also give new insight into default reasoning with conditional objects. 



1 Introduction 

The probabilistic treatment of uncertainty plays an important role in many applications 
of knowledge representation and reasoning. Often, we need to reason with uncertain 
information under partial knowledge and then the use of precise probabilistic assessments 
seems unrealistic. Moreover, the family of uncertain quantities at hand has often no 
particular algebraic structure. 

In such cases, a general approach is obtained by using (conditional and/or uncon- 
ditional) probabilistic constraints, based on the coherence principle of de Finetti and 
suitable generalizations of it [5,9,15,16]. Two important aspects in dealing with un- 
certainty are: (i) checking the consistency of a probabilistic assessment; and (ii) the 
propagation of a given assessment to further uncertain quantities. 

Another approach for handling probabilistic constraints is model-theoretic proba- 
bilistic logic, whose roots go back to Boole’s book of 1854 “The Laws of Thought” 
[8]. There is a wide spectrum of formal languages that have been explored in proba- 
bilistic logic, which ranges from constraints for unconditional and conditional events [2, 
13,19,20,22,23] to rich languages that specify linear inequalities over events [12]. The 
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main problems related to model-theoretic probabilistic logic are checking satisfiability, 
deciding logical entailment, and computing tight logically entailed intervals. 

Coherence-based and model-theoretic probabilistic reasoning have been explored 
quite independently from each other by two different research communities. For this 
reason, the relationship between the two areas has not been studied in depth so far. The 
current paper and our work in [7] aim at filling this gap. More precisely, our research is 
essentially guided by the following two questions: 

• Which is the semantic relationship between coherence-based and model-theoretic 
probabilistic reasoning? 

• Can algorithms that have been developed for efficient reasoning in one area also be 
used in the other area? 

Interestingly, it turns out that the answers to these two questions are closely related to 
default reasoning from conditional knowledge bases in system P. 

The literature contains several different proposals for default reasoning and extensive 
work on its desired properties. The core of these properties are the rationality postulates 
of system P proposed by Kraus, Lehmann, and Magidor [18]. It turned out that these 
rationality postulates constitute a sound and complete axiom system for several classical 
model-theoretic entailment relations under uncertainty measures on worlds. More pre- 
cisely, they characterize classical model-theoretic entailment under preferential struc- 
tures [25,18], infinitesimal probabilities [1,24], possibility measures [10], and world 
rankings. They also characterize an entailment relation based on conditional objects 
[1 1]. A survey of all these relationships is given in [3]. 

Roughly speaking, coherence-based probabilistic reasoning is reducible to model- 
theoretic probabilistic reasoning using concepts from default reasoning. Crucially, it even 
turns out that coherence-based probabilistic reasoning is a probabilistic generalization 
of default reasoning in system P. That is, we provide a new probabilistic semantics for 
system P, which is neither based on infinitesimal probabilities nor on atomic-bound (or 
also big-stepped) probabilities [4,26]. 

The current paper deals with the semantic aspects of these findings, while [7] focuses 
on its algorithmic implications for coherence-based probabilistic reasoning. 

The main contributions of the current paper can be summarized as follows: 

• We define a coherence-based probabilistic logic. We define a formal language of 
logical and conditional constraints, which are defined on arbitrary families of condi- 
tional events. We then define the notions of generalized coherence (or g-coherence), 
g-coherent consequence, and tight g-coherent consequence for this language. 

• We explore the relationship between g-coherence and g-coherent entailment, on the 
one hand, and satisfiability and logical entailment, on the other hand. 

• We show that probabilistic reasoning under coherence is a probabilistic generaliza- 
tion of default reasoning from conditional knowledge bases in system R 

• We show that this relationship reveals new insight into Dubois and Trade’s approach 
to default reasoning with conditional objects [11,3]. 

Note that detailed proofs of all results are given in the extended paper [6]. 
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2 Probabilistic Logic under Coherence 

In this section, we first introduce some technical preliminaries. We then briefly describe 
precise and imprecise probability assessments under coherence. We finally define our 
coherence-based probabilistic logic and give an illustrating example. 

2.1 Preliminaries 

We assume a nonempty set of basic events (p. We use _L and T to denote, false and true, 
respectively. The set of events is the closure of U {_L, T} under the Boolean operations 
-■ and A. That is, each element of {_L, T} is an event, and if f and ip are events, 
then also {(pAip) and -i^. We use {<p\/ ip) and {ip (p) to abbreviate ^{~^(p A ^ip) and 
-•{(pA -'Ip), respectively, and adopt the usual conventions to eliminate parentheses. We 
often denote by (p the negation -'cp, and by (pip the conjunction ipAip.A logical constraint 
is an event of the form ip^(p. Note that _L <J= a is equivalent to -i«. 

A world / is a truth assignment to the basic events in <P (that is, a mapping I: (p ^ 
{false, true}), which is extended to all events as usual (that is, {(p A ip) is true in I iff (p 
and Ip are true in I, and -'Cp is true in I iff (p is not true in I). We use to denote the set 
of all worlds for <P. A world / satisfies an event (p, or / is a model of (p, denoted I\=(p, 
iff / {(p) = true. I satisfies a set of events L, or / is a model of L, denoted I\=L, iff I 
is a model of all (pG L. An event (p (resp., a set of events L) is satisfiable iff a model of 
(p (resp., L) exists. An event ip is a logical consequence of (p (resp., L), denoted (p\=ip 
(resp., L ^ Ip), iff each model of <p (resp., L) is also a model of ip. We use (p ip (resp., 
L Ip) to denote that (p\=ip (resp., L\=ip) does not hold. 

2.2 Probability Assessments 

A conditional event is an expression '0|(() with events ip and (pf^ A. \t can be looked at as a 
three-valued logical entity, with values true, or false, or indeterminate, according 
to whether ip and (p are true, or ip is false and (p is true, or <p is false, respectively. 
That is, we extend worlds I to conditional events ip\ip hy I {ip\<p) = true iff I\=ip A(p, 
I{ip\(p) = false iff / Y^ -lip A <p, and I{ip\(p) = indeterminate iff I ^ ~'<p. Note that 
ip\(p coincides with ipA<p\(p. More generally, ipi\(pi and ip 2 \(p 2 coincide iff ipi A (pi = 
ip 2 A (p 2 and (pi = (p 2 . 

A probability assessment {L, A) on a set of conditional events S consists of a set 
of logical constraints L, and a mapping A that assigns each egS a real number in 
[0, 1]. Informally, L describes logical relationships, while A represents probabilistic 
knowledge. For {ipi\(pi, ■ ■ ■ ,ipn\(pn} with n > 1 and n real numbers si, . . . , s„, let 
the mapping G : ^ R be defined as follows. For every / G 

n 

G{I) = J^s^-I{(p,)-{I{ip,)-A{ip,\(pi)). 

i=l 

In the previous formula, we identify the truth values false and true with the real 
numbers 0 and 1, respectively. Intuitively, G can be interpreted as the random gain 
corresponding to a combination of n bets of amounts si • A{ipi\(pi ), . . . , s„ • A{ipn\(pn) 
on ipi\(pi, . . . ,ipn\<Pn with stakes si,...,s„. In detail, to bet on ipi\(pi, one pays an 
amount of Si ■ A{ipi\(pi), and one gets back the amount of Si, 0, and Si ■ A{ipi\(pi), when 
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A 4>i, -it/’i A 4>i, and respectively, turns out to be true. The following notion of 
coherence now assures that it is impossible (for both the gambler and the bookmaker) 
to have uniform loss. 

A probability assessment (L, A) on a set of conditional events S is coherent iff for 
every {fi . . . , C S with n > 1 and for all real numbers Si, . . . , s„, it holds 

max{ G'(I) I / € , / 1= L, / 1= (/)i V • • • V (/)„} > 0. 

An imprecise probability assessment (L, A) on a set of conditional events S consists 
of a set of logical constraints L and a mapping A that assigns each £ G £1 an interval 
\l, u] C [0, 1] with I < u. We say (L, A) is g-coherent iff there exists a coherent precise 
probability assessment (L, A*) on S such that A*{e) G A{e) for all £ G f . 

Let (L, A) be a g-coherent imprecise probability assessment on a set of conditional 
events £. The imprecise probability assessment , u] on a conditional event 7 is called a g- 
coherent consequence of {L, A) iff A* ( 7 ) G [l,u] for every g-coherent precise probability 
assessment A* on E U { 7 } such that A*{e) G A{e) for all £ G iP. It is a tight g-coherent 
consequence of {L, A) iff I (resp., u) is the infimum (resp., supremum) of A* ( 7 ) subject 
to all g-coherent precise probability assessments A* on £U { 7 } such that A*(£) G A{e) 
for all £ G f . 

2.3 Probabilistic Logic under Coherence 

In the rest of this paper, we assume that ^ is finite. A conditional constraint is an 
expression {ip\(j)) [I, u] with real numbers l,uG [0,1] and events ip and <f>. A probabilistic 
knowledge base KB = {L, P) consists of a finite set of logical constraints L, and a 
finite set of conditional constraints P such that (i) l<u for all {'f\<j>)[l,u] G P, and 
(ii) filfi f2\4>2 for all distinct {fi\<pi)[li,ui], {tp 2 \f 2 )[h,U 2 ] G P. 

Every imprecise probability assessment IP = {L, A) with finite L on a finite set of 
conditional events £ can be represented by the following probabilistic knowledge base: 

KB IP = {L,{{f\(j))[l,u] I flfeE, A{tp\(j)) = [;,u]}) . 

Conversely, every probabilistic knowledge base KB = {L,P) can expressed by the fol- 
lowing imprecise probability assessment IPkb = {L, Akb) on £kb- 

Akb = {{fifflu]) I {'ip\(l>)[l,u]€ KB} , 

£kb = {flf I 3;,uG [0, 1]: {tp\<p)[l,u]eKB} . 

A probabilistic knowledge base KB is said g-coherent iff IPkb is g-coherent. For g- 
coherent KB and conditional constraints (i/;|^)[/, m], we say m] is a g-coherent 

consequence of KB, denoted KB u], iff {{flf, [I, u])} is a g-coherent con- 

sequence of IPkb- It is a tight g-coherent consequence of KB, denoted KB 
m], iff [(, u])} is a tight g-coherent consequence of IPkb- 

Example 2.1. The logical knowledge “all penguins are birds” and the probabilistic 
knowledge “birds have legs with a probability of at least 0.95”, “birds fly with a proba- 
bility between 0.9 and 0.95”, and “penguins fly with a probability of at most 0.05” can 
be expressed by the following probabilistic knowledge base KB = ({bird •<= penguin}, 
|(legs|bird)[.95, 1], (fly|bird)[.9, .95], (fly|penguin)[0, .05]}). 

It is easy to see that AB is g-coherent and that (legs [bird) [.95,1], (legs] penguin) [0,1], 
(fly|bird)[.9, .95], and (fly [penguin) [0, .05] are tight g-coherent consequences of KB. 
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3 Relationship to Model- Theoretic Prohahilistic Logic 

In this section, we characterize the notions of g-coherence and of g-coherent entailment 
in terms of the notions of satisfiahility and of logical entailment. 

3.1 Model-Theoretic Probabilistic Logic 

A probabilistic interpretation Pr is a probability function on (that is, a mapping 
Pr: — >■ [0, 1] such that all Pr{I) with I sum up to 1). The probability of an 

event </> in the probabilistic interpretation Pr, denoted Pr{(f>), is defined as the sum of all 
Pr{I) such that I G and I \= (j>. For events (j) and ip with Pr{(p) > 0, we use Pr{ip\(p) 
to abbreviate Pr{ip A (p) / Pr{<p). The truth of logical and conditional constraints F in 
a probabilistic interpretation Pr, denoted Pr |= F, is defined as follows: 

• Pr \= Ip ^ (p iff Pr{ip A(p) = Pr{<p). 

• Pr \= {ip\(p)[l,u\ iff Pr{(p) = 0 or Pr{ip\(p) G [l,u\. 

We say Pr satisfies a logical or conditional constraint F, or Pr is a model of F, iff 
Pr ^ F. We say Pr satisfies a set of logical and conditional constraints T, or Pr is a 
model of T, denoted Pr |= T, iff Pr is a model of all F gT . We say that T is satisfiable 
iff a model of T exists. 

We next define the notion of logical entailment. A conditional constraint F = 
{ip\(p)\l,v] is a logical consequence of a set of logical and conditional constraints T, 
denoted F\=F, iff each model of T is also a model of F . It is a tight logical consequence 
of T, denoted T \=tight ^ (resp., u) is the infimum (resp., supremum) of Pr{ip\(p) 
subject to all models Pr of T with Pr{<p) > 0. Note that we define I = 1 and t6 = 0, 
when .7^ ^ (01 T) [0, 0]. A probabilistic knowledge bases KB = {L,P) is satisfiable iff 
L U P is satisfiable. A conditional constraint {ip\(p) \l, u] is a logical consequence of KB, 
denoted KB \= {ip\(p)\l,u], iff LiJ P \= {ip\(p)\l,v\. It is a tight logical consequence of 
KB, denoted KB \=fighf (0|0)[/,m], iff LUP hr/gfo «]■ 

3.2 G-Coherence in Model- Theoretic Probabilistic Logic 

The following theorem shows how g-coherence can be expressed through the existence of 
probabilistic interpretations. This result follows from a characterization of g-coherence 
in [15]. It shows that KB = (L, P) is g-coherent iff every nonempty P' CP has a model 
Pr such that Pr |= P and that Pr{(p) >0 for at least one {ip\<p)[l,u] G P'. 

Theorem 3.1. Let KB — (L, P) be a probabilistic knowledge base. Then, KB is g- 
coherent ijf for every nonempty P„ = {(0i|0i)[/i, Mi], . . . , {ipn\(pn)[ln, Un]} PP P there 
exists a model Pr of LU Pn such that Pr (0i V • • • V 0„) > 0 . 

It then follows that g-coherence has a characterization similar to p-consistency in 
default reasoning. To formulate this result, we adopt the following terminology from 
default reasoning from conditional knowledge bases [3]. A probabilistic interpretation 
Pr a conditional constraint {ip\(p)[l, u], iff Pr{(p) > 0 and Pr \={ip\ip)[l, w]. A set 

of conditional constraints P tolerates a conditional constraint F under a set of logical 
constraints L, iff there exists a model of P U P that verifies F. We say P is under L in 
conflict with P, iff no model of P U P verifies F. 
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We are now ready to characterize g-coherence in a way similar to p -consistency by 
Goldszmidt and Pearl [17]. Note that in [7] we use this characterization to provide a new 
algorithm for deciding g-coherence, which is essentially a reformulation of a previous 
algorithm by Gilio [15] using terminology from default reasoning, and which is closely 
related to an algorithm for checking p-consistency given in [17].' 

Theorem 3.2. A probabilistic knowledge base KB = {L^P) is g-coherent iff there exists 
an ordered partition (Pq, Pk) of P such that either 

(a) every Pi, 0 < i < k, is the set of all F G Uj=i Pj tolerated under L by Pj> or 

(b) for every i, 0<i<k, each F G Pi is tolerated under L by Uj=i Pj- 

3.3 G-Coherent Entailment in Model-Theoretic Probabilistic Logic 

We next show how g-coherent entailment can be reduced to logical entailment. 

For probabilistic knowledge bases KB = (L, P) and events a, let PffKB) denote 
the set of all subsets = {{ffi\(j)i)\li,ui\, . . . , ffn\fn)[lm u„]} of P such that every 
model Pr of LU P„ with Pr{fi V • • • V V a) >0 satisfies Pr{a) > 0. 

The following theorem shows that the tight interval concluded under coherence can 
be expressed as the intersection of some logically entailed tight intervals. 

Theorem 3.3. Let KB = (L, P) be a g-coherent probabilistic knowledge base, and 
let j3\a be a conditional event. Then, KB {j3\a)[l, m], where 

P' \=tight (/3|a) [c, d\ for some P' G PffKB)} . 

Clearly, this reduction of g-coherent entailment to logical entailment is computa- 
tionally expensive, as we have to compute a tight logically entailed interval for each 
member of Pa{KB). In the following, we show that we can restrict our attention to the 
unique greatest element in Pa ( KB ) . The following lemma shows that Pa ( KB ) contains 
indeed a unique greatest element with respect to set inclusion. This result can be proved 
by showing that Pa{KB) is nonempty and closed under set union. 

Lemma 3.4. Let KB = (L, P) be a g-coherent probabilistic knowledge base, and let a 
be an event. Then, Pa{KB) contains a unique greatest element. 

The next theorem now shows the crucial result that g-coherent entailment from KB 
can be reduced to logical entailment from the greatest element in Pa{KB). 

Theorem 3.5. Let KB = (L, P) be a g-coherent probabilistic knowledge base, and let 
F = (/3|q;) [I, u] be a conditional constraint. Let KB* = (L, P*), where P* is the greatest 
element in Pa{KB). Then, 

(a) KB[^F iff KB* \= F. 

(b) KB[-^fig^fF iff KB* \= tight F ■ 

* Note that the relationship between the algorithms in [15] and [17] was suggested first by Didier 
Dubois (personal communication). 
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Thus, computing tight g-coherent consequences can be reduced to computing tight 
logical consequences from the greatest element P* in Pa{KB). The following theorem 
shows how P* can be characterized and thus computed. More precisely, it specifies 
some P* by two conditions (i) and (ii). It can be shown that (i) implies that every 
member of Pa{KB) is a subset of P* , and that (ii) implies that P* belongs to Pa{KB). 
In summary, this proves that the specihed P* is the greatest element in Pa{KB). 

Theorem 3.6. Let KB — {L, P) be a g-coherent probabilistic knowledge base and a be 
an event. Let P* C P and {Pq, . . . , P^) be an ordered partition of P\P* such that: 

( i ) every Pi,0<i<k, is the set of all elements in Pi U ■ ■ ■ U Pfc U P* that are tolerated 
under L U {_L a\ by Pi\J ■ ■ ■ U Pfc U P* , and 

(ii) no member of P* is tolerated under L U {_L <^= a} by P*. 

Then, P* is the greatest element in Pa{KB). 

In summary, by Theorems 3.5 and 3.6, a tight interval under g-coherent entailment 
can be computed by first checking g-coherence, and then computing a tight interval 
under logical entailment [7]. Semantically, Theorems 3.5 and 3.6 show that g-coherent 
entailment coincides with logical entailment from a smaller knowledge base. That is, 
under g-coherent entailment, we simply cut away a part of the knowledge base. Roughly 
speaking, we remove all those conditional constraints u] G P where f is “larger” 

than a. Intuitively, g-coherent entailment does not have the property of inheritance, nei- 
ther for logical knowledge nor for probabilistic knowledge, while logical entailment 
shows inheritance of logical knowledge but not of probabilistic knowledge. The follow- 
ing example illustrates this difference. 

Example 3.7. Consider the following probabilistic knowledge base: 

KB = ({bird •<= penguin}, |(legs|bird)[l, 1], (wings|bird)[.95, 1]}) . 

Notice that KB is g-coherent and satisfiable. Moreover, we have: 

KB [-■ tight iHs\pengu\n)[0,l] and TCP (legs|penguin)[l, 1] , 

KB 'r tight (wings|penguin)[0, 1] and KB \=tight (wings|penguin)[0, 1] . 

That is, under g-coherent entailment, neither the logical property of having legs nor the 
probabilistic one of having wings is inherited from birds to penguins. Under logical 
entailment, however, the logical property is inherited, while the probabilistic one is not. 

3.4 Coherence-Based versus Model-Theoretic Prohahilistic Logic 

We now describe the rough relationship between g-coherence and satisfiability, and 
between g-coherent entailment and logical entailment. The following theorem shows 
that g-coherence implies satisfiability. This result is immediate by Theorem 3.1. 

Theorem 3.8. Every g-coherent probabilistic knowledge base KB is satisfiable. 

In fact, g-coherence is strictly stronger than satisfiability, as the next example shows. 
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Example 3.9. Consider the probabilistic knowledge base iCB = (0, {(fly|bird)[.9, 1], 
(->fly|bird)[.2, 1]}). It is easy to verify that KB is satisfiable, but not g-coherent. 

The next theorem shows that logical entailment is stronger than g-coherent entail- 
ment. That is, g-coherent consequence implies logical consequence (or there are more 
conditional constraints logically entailed than entailed under g-coherence) and the tight 
intervals that are derived under logical entailment are subintervals of those derived under 
g-coherent entailment. This result follows immediately from Theorem 3.5. 

Theorem 3.10. Let KB = (L, P) be a g-coherent probabilistic knowledge base, and 
let {f3\a)[l, u] and {j3\a)[r, s] be two conditional constraints. Then, 

(a) KB {f3\a)[l,u] implies KB \= {j3\a)[l,u]. 

(b) KB tight (/^l“) \=tight (/5|«) [c s] implies [(, u] 2 [r, s]. 

The following example now shows that logical entailment is in fact strictly stronger 
than g-coherent entailment (note that we identify [ 1 , 0 ] with the empty set). 

Example 3.11. Consider the following probabilistic knowledge bases KBi and KB 2 '. 

KBi = (0, {(fly|bird)[l, 1], (mobile|fly)[l, 1]}) , 

KB 2 = (0, {(fly|bird)[l, 1], (bird|penguin)[l, 1], (-.fly [penguin) [1, 1]}) . 

Some tight g-coherent and tight logical consequences of KBi and KB 2 are given by: 

^^1 hrigfe (mobile|bird)[0, 1] and KBi (mobile] bird) [1, 1] , 

'^■S 2 hrigfe(-fly|penguin)[l,lj and KB 2 (-fly [penguin) [1, 0] . 



4 Relationship to Default Reasoning in System P 

In this section, we show that consistency and entailment in system P are special cases of 
g-coherence and of g-coherent entailment, respectively. That is, probabilistic logic under 
coherence gives a new probabilistic semantics for system P, which is neither based on 
infinitesimal probabilities nor on atomic-bound (or also big-stepped) probabilities. 

4.1 Default Reasoning in System P 

We now describe the notions of consistency and of entailment in system P [18]. We 
define them in terms of world rankings. 

A conditional rule (or default) is an expression of the form f/. t— 0, where f and ft 
are events. A conditional knowledge base KB = {L, D) consists of a finite set of logical 
constraints L and a finite set of defaults D. 

A world I satisfies a default t/. t— </>, or / is a model denoted I \= 

iff / \= tp The world / verifies (/)iff/[=(()A^. The world I falsifies ^ t— </> 
iff / ^ ^ A (that is, / [A 7)1 ^ f), / satisfies a set of events and defaults K, or / is a 
model of K, denoted I \= K, iff I satisfies every member of K. We say K is satisfiable 
iff a model of K exists. A set of defaults D tolerates a default d under a set of classical 
formulas L iff U L has a model that verifies d. A set of defaults D is under L in conflict 
with a default t— (() iff all models of U L U {f} satisfy -'ip. 
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A world ranking k is a mapping k: I 4 , — >■ {0, 1, . . . } U { 00 } such that k{I) = 0 
for at least one world I. It is extended to all events (p as follows. If p is satisfiable, 
then k{4>) = min {k(/) 1 1 I |= </)}; otherwise, «;(</>) = 00 . A world ranking k is 
admissible with a conditional knowledge base (L, D) iff K{~'(j>) = 00 for all (f> G L, and 
k{4>) < 00 and k{ 4> Atp) < K{(f> A for all defaults G D. 

A conditional knowledge base KB is p -consistent iff there exists a world ranking 
that is admissible with KB. It is p-inconsistent iff no such a world ranking exists. We 
say KB p-entails a default iff either K{(p) = 00 (that is, <j) is unsatisfiable) 

or K{(f> Alp) < k{4> a - 1 ^) for all world rankings k that are admissible with KB. 

A default ranking a on KB = (L, D) maps each d G 19 to a nonnegative integer. 
It is admissible with KB iff each D' C D that is under L in conflict with some dG D 
contains a default d' such that a{d') < a(d). 

4.2 G-Coherence and P-Consistency 

We now show that g-coherence is a generalization of p -consistency. 

Recall first that the characterization of p-consistency by Goldszmidt and Pearl [17] 
corresponds to the characterization of g-coherence given in Theorem 3.2. 

The following well-known result (see especially [14]) shows that p-consistency is 
equivalent to the existence of admissible default rankings. 

Theorem 4.1. A conditional knowledge base KB is p-consistent iff there exists a default 
ranking on KB that is admissible with KB. 

A similar result holds for g-coherence, which is subsequently formulated using the 
following concepts. A ranking a on KB = {L, P) maps each element of P to a nonneg- 
ative integer. It is admissible with KB iff each P' CP that is under L in conflict with 
some F G P contains a conditional constraint F' such that a{F') <a{F). 

Theorem 4.2. A probabilistic knowledge base KB is g-coherent iff there exists a ranking 
on KB that is admissible with KB. 

The following theorem finally shows the important result that g-coherence is a gen- 
eralization of p-consistency. 

Theorem 4.3. Let KB = (T, 1], . . . , 1]}) be a probabilistic 

knowledge base. Then, KB is g-coherent iff the conditional knowledge base KB' = 
{L, {-ipi 01, ..., •<— (/)„}) is p-consistent. 

4.3 G-Coherent Entailment and P-Entailment 

We now show that g-coherent entailment is a generalization of p -entailment. 

The following result is essentially due to Adams [1], who formulated it for L = 0. 

Theorem 4.4 (Adams [1]). A conditional knowledge base KB = (L, D) p-entails a 
default (3 G- a iff{L, D U {-i/3 ^ a}) is p-inconsistent. 

The following theorem shows that a similar result holds for g-coherent consequence, 
which is an immediate implication of the definition of g-coherent entailment. 
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Theorem 4.5. Let KB = (L, P) be a g-coherent probabilistic knowledge base, and let 
{(3\a)[l,u] be a conditional constraint. Then, KB {j3\cx)[l,u] ijf{L,P\j{{j3\a)[p,p]\) 
is not g-coherent for all p € [0, 1) U (u, 1]. 

The following related result for tight g-coherent consequence completes the picture. 



Theorem 4.6. Let KB = (L, P) be a g-coherent probabilistic knowledge base, and let 
{P\a)[l, m] be a conditional constraint. Then, KB {(3\a)[l, m] iff 

(i) {L, P\J {{l3\a)[p,p\}) is not g-coherent for all p G [0,1) U {u,l], and 

(ii) (L,PU{(/3|a)[p,p]}) is g-coherent for all pG [l,u]. 

The next result finally shows that g-coherent entailment generalizes p-entailment. 

Theorem 4.7. Let KB — {L, {{fi [1, 1], . . . , (/)„)[!, 1]}) be a g-coherent prob- 

abilistic knowledge base. Then, KB |~ (/3|o;)[l, 1] iff the conditional knowledge base 
[L, {fi (j>i, . . . ,'ipn ^ 4>n}) p-entails [3-^ a. 



5 Relationship to Default Reasoning with Conditional Objects 

In this section, we relate coherence-based probabilistic reasoning to default reasoning 
with conditional objects, which goes back to Dubois and Prade [1 1,3]. 

We associate with each set of defaults D = {fi ^ ^ (/)„}, the set of con- 

ditional events Co = {V'l I'/'i) ■ • ■ ) 'f’n\4>n}- Given a nonempty set of conditional events 
S = {'tpi\(j>i, . . . , tpn\4'n}, the quasi-conjunction of E, denoted QC(£), is dehned as the 
conditional event {tpi <1= ^i) A • • • A {fn 4>n) | V • • • V fn- 

We now dehne the notions of co-consistency and co-entailment as follows. A 
conditional knowledge base KB = {L, D) is co-consistent iff, for every nonempty 
D' C D, there exists a model I of L such that I{QC{Cd')) = true. We assume the 
total order false < indeterminate < true. We say KB = {L, D) co-entails a default 
iff either (i) L U {a} \= (3, or (ii) some nonempty D' C D exists such that 
I{QC{Cd')) < I{l3\cf) for all models / of L. 

The notions of co-consistency and co-entailment coincide with the notions of p-con- 
sistency and p-entailment, respectively [11,3]. We now show that our results in Sections 3 
and 4 are naturally related to default reasoning with conditional objects. 

It is easy to verify that the following counterpart of Theorem 3.1 for p-consistency 
formulates the above notion of co-consistency. Note that the notion of satishability used 
in this theorem is dehned as in Section 4. 1 . 

Theorem 5.1. A conditional knowledge base KB = (L, D) is p-consistent iff L \J D' \J 
{4>\\/ ■■■ y fn} is satisfiable for every nonempty D' = {fi fi, . . ., fn} Q D- 

For conditional knowledge bases KB = (T, D) and events a, let Da{KB) be the set 
of all D' = {tpi ^ 4>i, . . . ,fnG- fn} Q D such that L\JD'\J {fi V • • • V <}()„ V a} ^ a. 
Observe now that for D' = {ifi fi, . . . ,ifnG- 4>n}, condition (ii) in the dehnition of 
the notion of co-entailment is equivalent to L U D' U {fi V • • • V </>„ V a} \= a and 
L\J D' 1= (3<^a. Thus, the following counterpart of Theorem 3.3 for p-entailment 
formulates the above notion of co-entailment. 
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Theorem 5.2. Let KB = (L, D) be a p -consistent conditional knowledge base. Then, 
KB p-entails the default a, iff L \J D' \= (3 <= a for some D' G Da{KB). 

Crucially, we can now also formulate counterparts to Lemma 3.4 and Theorems 3.5 
and 3.6. To our knowledge, these results for system P are unknown so far. The following 
result shows that Da{KB) contains a unique greatest element. 

Lemma 5.3. Let KB = (L, D) be a p-consistent conditional knowledge base, and let 
a be an event. Then, Da{KB) contains a unique greatest element. 

The next result shows that p-entailment from KB coincides with logical entailment 
from the greatest element in Da{KB). That is, we can replace item (ii) in the definition 
of co-entailment hy (ii’) I{QC{Cd*)) < I{(i\oi) for the greatest D* in Da{KB). 

Theorem 5.4. Let KB = (L, D) be a p-consistent conditional knowledge base, and let 
P ^ a be a default. Let D* denote the unique greatest element in Dq.{KB). Then, 

KB p-entails /3 ^ a iff D* \= a. 

The following theorem shows how D* can he characterized and thus computed. 

Theorem 5.5. Let KB = (L, D) be a p-consistent conditional knowledge base and a 
be an event. Let D* C D and {Dq, . . . , Df) be an ordered partition of D\D* such that: 

( i) every Di, 0<i<k, is the set of all elements in DiU ■ ■ ■ U U D* that are toler- 
ated under L U {_L a} by DiU • • • U U D*, and 

(ii) no member of D* is tolerated under L U {_L <^= a} by D*. 

Then, D* is the greatest element in Da(KB). 

6 Summary and Outlook 

We explored the relationship between probabilistic logic under coherence, model-theo- 
retic probabilistic logic, and default reasoning in system P. We showed that coherence- 
based probabilistic reasoning can be reduced to model-theoretic probabilistic reasoning 
by using concepts from default reasoning. Moreover, we showed that it is a probabilistic 
generalization of default reasoning in system P. That is, we gave a new probabilistic 
semantics for system P, which is neither based on infinitesimal probabilities nor on 
atomic-bound (or also big-stepped) probabilities. We finally showed that these results 
also give new insight into default reasoning with conditional objects. 

Roughly speaking, the main difference between coherence-based and model-theo- 
retic probabilistic reasoning is that the former generalizes default reasoning in system P, 
while the latter generalizes classical reasoning in propositional logic. 

A very interesting topic of future research is to explore how other notions of coherence 
are related to model-theoretic probabilistic logic and to default reasoning. It would also 
be very interesting to develop coherence-based probabilistic extensions of notions of 
default reasoning different from system P (for example, in the spirit of [21]). 
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Abstract. Belief functions may be taken as an alternative to the clas- 
sical probability theory, as a generalization of this theory, but also as 
a non-traditional and sophisticated application of probability theory. In 
this contribution, the idea of numerically quantified degrees of belief is 
abandoned in favour of the case when belief functions take their values 
in partially ordered sets perhaps enriched to lower or upper semilattices. 
Such structures seem to be the most general ones to which reasonable 
and nontrivial parts of the theory of belief functions can be extended 
and generalized. 



1 Introduction, Motivation, Preliminaries 

The degrees of belief quantified by belief functions, and the mathematical the- 
ory processing them and sometimes called the Dempster-Shafer theory, present 
an interesting mathematical model and tool for uncertainty quantification and 
processing. Belief functions may be taken, at the same time, as an alternative to 
the classical probability theory, as a generalization of this theory, but also as a 
non-traditional and sophisticated application of probability theory. The shortest 
way to the notion of belief function is the formalized combinatoric one: let m be 
a probability distribution on the power-set V{S) of all subsets of a finite set S, 
hence, m : V{S) — >■ [0, 1] is such that 1 m is called a basic 

probability assignment on S (b.p.a.) in this context. The (non-normalized) belief 
function belm ■ 'P(S) — >■ [0,1] is defined, given A C S', by 

beUA)= ^ m{B), (1.1) 

applying the convention according to which belm(0) = 0 for the empty subset 
0 of S. Keeping in mind the idea of possible generalization of belief functions 
to those taking also non-numerical values, only non-normalized belief functions 
will be considered below. 

The following interpretation behind brings us to the idea of set-valued map- 
pings important for our purposes. Let S be the set of all possible internal states 
of a system (a number of alternative interpretations being also possible) just 
one state sg € S being the actual one, let E be the space (perhaps a vector 
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one) of empirical data which may result from some observations, experiments, 
measurements, etc. concerning the system in question. All what is known about 
the system is expressed by the so called compatibility relation p : E —>■ {0,1} 
with the following intuition behind: if p(s, x) = 0 for some s G S and x G E, 
then s cannot be the actual state of the system supposing that x was observed. 
If p(s, x) = 1, then s cannot be avoided from consideration when observing x, in 
other wording, s and x are compatible. Given x G E, we can define the subset 
Up{x) = {s G S : p{s,x) = 1} of states of S which are compatible with x. 

The phenomenon of uncertainty enters our model supposing that the em- 
pirical data are of random nature. Namely, we shall suppose that x is the value 
taken by a random variable X defined on a fixed probability space (f?, A, P) and 
with values in a measurable space {E, £) generated by an appropriate nonempty 
(j-field of subsets of E. The composed mapping Up{X{-)) : £2 -G V{S) is sup- 
posed to be measurable in the sense that for each A C S' its inverse image 
{Up{X))~^{A) is in A, so that the value 

m{A) = P{{ojGf2-. Up{X{uj)) = A}) (1.2) 

is defined. Hence, the mapping Up{X{-)) is supposed to be a random set or, more 
correctly, a (generalized set- valued) random variable which takes the probability 
space (f2,A,P) into the measurable space (P{S),P{P{S))). S being finite, the 
relation (1.2) evidently defines a b.p.a. m on S and we can easily deduce that, 
given A C S, 



bel„(A) = P ({w G 12 : 0 yf Up{X{to)) C A}) . (1.3) 

The approach to belief functions through compatibility relations and random 
sets has been already many times proposed, analyzed and investigated and we 
have repeated here the most basic idea just for the convenience of the reader 
perhaps not familiar with it (cf. some references in the list below). 

Leaving aside a number of generalizations of the model just introduced (cf. 
[13], e. g., or some more special papers listed there), we shall focus our attention 
to the case when the degrees of belief are not quantified by real numbers, as 
it is the case in (1.1) and (1.3), but rather by elements of some non-numerical 
structures which may perhaps better reflect the nature of uncertainty in various 
particular cases. E.g., the degrees of belief need not be always dichotomic, i.e., 
some pairs of degrees of belief need not be comparable by the relation “greater 
than or equal to” without introducing some new and ontologically independent 
principles and accepting all the risks joined with such a step. Perhaps the first 
non-numerical structure arising in one’s mind as a good tool for these sakes is a 
Boolean algebra, in particular, the Boolean algebra of all subsets of a fixed space 
with respect to the standard set-theoretic operations (cf. [11] and some papers 
listed there). 

Going further with these reasonings we arrive at the key problem of this pa- 
per: which are the most general and most simple conditions which the structure 
over the degrees of uncertainty should meet in order to be able to develop a 
non-trivial fragment of the common theory of belief functions within the new 
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framework? The aim of this contribution is to argue in favour of the idea that 
the degrees of belief should define a partially ordered set, perhaps enriched to 
an upper or lower (semi) lattice or to a lattice. This kind of results presented 
below implies that one should not expect some qualitatively new and perhaps 
surprising ones. On the other side, let us hope that the achieved results are still 
interesting enough to justify our effort. 

Let us close this chapter by recalling some most elementary notions concern- 
ing the partially ordered sets. Quasi-partially ordered set is a pair (T, where 
r is a nonempty set and ^ is a reflexive and transitive binary relation on T. 
If T is, moreover, antisymmetric, the pair (T, <) is called a partially ordered 
(p.o.) set induced in T by the partially ordering (relation) Each quasi-p.o. set 
{T, can be easily converted into a partially ordered set over the equivalence 
classes T| «, here x ~ y holds iS x < y and y di x hold simultaneously. Given 
a p.o. set (T, and a nonempty subset A C T, the supremum x and the 
inGmum /\^^j^x (v^ and /\A, abbreviately) are defined in the standard way, 
even if these values need not be always defined. If A = {xi,X2, ■ ■ ■ , Xn\, we shall 
write V X2 V • • • V and a;i A X2 A • • • A instead of and A^- If \JT 
{/\T, resp.) is defined, it is called the unit (zero, resp.) element of the p.o. set 
{T,A) and denoted by I7- (O7-, resp.). In this case the definition of supremum 
and infimum can be extended also to the empty subset 0 of T, setting A0 = Ir 
and V0 = Ot- TIi® definition of supremum and infimum can be extended also to 
quasi-p.o. sets, but in this case, if and/or A^ are defined, they are defined 
up to the equivalence relation «. 

The reader is supposed to be familiar with the most elementary properties of 
p.o. set or she/he may consult [1,4] or other elementary textbook or monograph. 

2 Set Structures over Partially Ordered Sets and 
Complete Upper Semilattices 

In this chapter we shall build a structure of partial ordering over the power- 
set V(T) of all subsets of T, which extends conservatively the properties of 
partial ordering in T, and which can be totally embedded into the p.o. set (T, 
supposing that this p.o. set is complete in the sense that and f\A are defined 
for all Act. So, given a p.o. set (T, ^), let us define a binary relation C on 
V(T) = {A : ^ C T} in such a way that A C B holds for A, B C T iff, for each 
S'! C A such that V>5'i exists, there exists S2 C B such that V<S'2 is defined and 
the relation \/Si ^ \jS2 holds. The following assertion can be easily proved (cf. 
[12] for the details of the proofs of all the statements presented below). 

Lemma 2.1. The relation C is a quasi-partial ordering on V(T) which extends 
conservatively the set-theoretic inclusion in V(T), i.e., A C B C T implies that 
ACB. □ 

Using the standard construction mentioned above, we introduce the equiva- 
lence relation ~ on V(T), setting A ^ B iA AC B and B C A hold simultane- 
ously for A, B C T . Abusing the symbol U, we may extend it to the equivalence 
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classes [A] G V(T')I where [A\ = {B c T ■. B ^ A}. So, [A] C [B] holds iff 
A Q B holds; the validity of this relation clearly does not depend on the choice of 
representatives of the classes [A] and [B] . As can be easily proved, for each A C T 
such that is defined the identity [A] = [{vA}] holds, hence, the subsets of T 
which possess supremum in T are completely represented, up to the equivalence 
relation by this supremum value. Given a system A C 'P{T) of subsets of T, 
we denote by UA (flA, resp.) the supremum (infimum, resp.) of this system of 
sets with respect to the partial ordering C on V{T). In general, the values UA 
and nA need not be defined, but if they exists, they are defined uniquely up to 
the equivalence relation As can be easily seen, given A C V{T) and denoting 
by U-4 = [Ja&a^ i-be union of all sets from A, the relation UA E [ijA] holds. 
If yT = I 7 - is defined, then the relation 0 IZ A C T holds for each A C T (the 
relation 0 tl A holds due to the trivial fact that there is no nonempty S' C 0, so 
that the antecedent of the corresponding relation is always false). 

Definition 2.1. A partially ordered set T = is called upper semilattice, 

if for each ti, t 2 G T their supremum G V t 2 G T is defined. T is called lower 
semilattice, if for each t\, t 2 &T their infimum ti At 2 G T is defined. T is called 
complete upper semilattice, if for each 0 yf A C T the supremum \jA G T is 
defined. T is called complete lower semilattice, if for all 0 A C T the infimum 
AA G T is defined. 

E. g., each complete Boolean algebra, hence, in particular, each power set over 
a nonempty set, together with the set-theoretic relations and operations of in- 
clusion, union and intersection, is at the same time a complete upper semilattice 
and a complete lower semilattice. Every p.o. set which possesses this property, 
i.e., which is simultaneously a (complete) upper semilattice and a (complete) 
lower semilattice, is called the (complete) lattice. More generally, each Boolean 
algebra is an upper and lower semilattice, hence, a lattice. 

Theorem 2.1. Let T = {T,<) be a complete upper semilattice. Then 

(i) for each t &T, [{t}] = {A C T : \/A = t}, 

(ii) for each AcT, [A] = [{vA}], hence [A] = {A C T : yB = vA}, 

(iii) for each A, B C T, [A] C [B] holds iff yA -< yB holds, 

(iv) for each A, B C T, [A] U [B] = [AU B], if [A] U [B] defined, 

(v) for each A, B C T, [ACi B]\A [A] □ [B\ , if [A] □ [B\ defined, 

(vi) if T is, moreover, a lower semilattice then, for each A, B C T, [A] □ [B] = 

[{(vA) A [VB]}]. □ 

Proof. The assertions are more or less evident, detailed proofs can be found in 

[ 12 ]. □ 

3 Belief Functions with Values in Partially Ordered Sets 

Let S' be a nonempty set, let T = (T, be a.p.o. set, let V and A denote 

the (partial, in general) supremum and infimum operations in T induced by the 
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partial ordering relation let Bt = {P{T),U, C\,T — ■) be the Boolean algebra 
induced in the power-set P{T) of all subsets of T by the standard set-theoretic 
operations of union (U), intersection (n) and complement (T — •). 

Definition 3.1. Br-valued basic possibilistic assignment on S (yBr-b.poss.a. on S, 
abbreviately) is a mapping tt : V{S) — >■ V{T), i. e., tt{A) C T for all Ac S, such 
that UacS'^(^) = ^T-b.poss.a. tt is called compact, if there exists A C S such 
that 7 t(A) = T. ,BT-b.poss.a. tt is called Br-basic probabilistic assignment on S, 
if 7 t(A) ri7r(i?) = 0 for all A, B C S such that AC\B = %. The BT-(valued) belief 
function defined by a Br-b.poss.a. tt on 5 is the mapping BEL^^ : V{S) — > V{T) 
ascribing to each 0 A C S' the subset 

BEL^(A) = (3-1) 

of T, by convention, BELtt(0) = 0 for the empty subset of S. 

The properties of belief functions taking their values (degrees of belief) in a 
Boolean algebra are at a more general level, and in more detail, investigated in 
[11], so that we shall refer to the corresponding results and statements without 
repeating their proofs. Here we shall take into consideration the fact that the 
values 7r(H) and BEL^{A), A c S, are subsets of the p.o. set T, so that they 
can be subjected to the quasi-partial ordering relation C defined on V(T) and 
extended to the equivalence classes from the factor-space V{T)/ Lemma 2.1 
and Theorem 2.1 yield immediately that, for each A C B C S, the relation 
BELtt{A) C BELt^(^b) holds. For every A, B C S we can deduce that 

[BEL^{A)] U [BEL^{B)] C \BEL^{A U B)] (3.2) 

holds supposing that [BEL^(H)] U [BELt^(B)] is defined. 

The mapping BEL^^ : V{S) — >■ V{T), defined by (3.1), easily induces the 
mapping BEL*^ : V{S) P{T)/ setting simply, given H C S' 

BELl(A) = [BEL^{A)] = {RcT: Rr^ BEL^{A)} = (3.3) 

= {R C T : R C BEL^(A) and BBL^(H) C i?}. 

Similarly, the b.poss.a. tt : V(S) —>■ V{T) induces the mapping tt* : V{S) 
V{T)/ ~ such that, for each Ac S, 

n*{A) = [7r(H)j = {R C T : R C tt{A) and 7 t(H) C R}. (3.4) 

The inclusion tt{C) C BELt,.{A), valid by definition for every 0 C C A implies 
immediately, using Lemma 2.1, that for each A C S the relation 



U^yccA^*{C)CBEL:{A) (3.5) 

holds. The following lemma specifies the conditions under which the C-inclusion 
in (3.5) can be replaced by equality. 
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Lemma 3.1. Let (T,~<) be a complete upper semilattice, let tt : V{S) — ?> V{T) 
be a yBr-valued b.poss.a. on S. Then for each finite A C S' 

BELl{A) = U0^cca’t*(C'). (3.6) 

In particular, if whole the space S is finite, then 

UAc57r*(A) = [T], (3.7) 

□ 

Proof. The assertion follows from Theorem 2.1, (iv), supposing that this state- 
ment is easily extended to any finite nonempty system A C V{T) of subsets 
ofT. □ 

(3.7) may be taken also in such a way that the mapping tt* is a basic pos- 
sibilistic assignment on S taking its values in the factor-space V{T)/ The 

relation (3.6) then enables to understand the mapping BEL* as the belief func- 
tion defined by the b.poss.a. tt*. The condition that (T, is a complete upper 
semilattice seems to be the weakest one imposed on the set of values of the 
b.poss.a. TT* under which the basic philosophy underlying the idea of belief func- 
tions can be applied. 



4 Dempster Combination Rule for Partially Ordered 
Degrees of Belief 

Within the framework of the classical Dempster-Shafer theory of belief functions, 
Dempster combination rule is defined as follows. Let S' be a finite nonempty set, 
let mi, m 2 be basic probability assignments on S, i.e., probability distributions 
on the power-set P{S) of all subsets of S. Let mi 2 : P(S) — >■ [0, 1] be the mapping 
defined by 

mi2(4) = 

for each 4 C S. As can be easily proved, mi 2 is also a basic probability assign- 
ment on S, denoted also by mi © m 2 and called the Dempster product of mi 
and m 2 . (4.1) is then called the Dempster combination rule for basic probability 
assignments. (N on-normalized) belief function defined by a basic probabilistic 
assignment m on S' is the mapping belm ■ ^{3) — >■ [0, 1] such that 

for all A C S, belm(9) = 0 by convention. Dempster product is defined also for 
(non-normalized) belief functions, setting simply 



beljji^ © bel.ui2 



■df beljn^(^m^. 



(4.3) 
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As analyzed in more detail in [8] or elsewhere, Dempster combination rule 
is legitimate supposing that the compatibility relations pi, p 2 of the two sub- 
jects in question are composed by the operation of minimum, i.e., pi 2 (s,a;) = 
min{pi(s, a;), p 2 (s, a;)} for every s € S and x G E, and that the set-valued ran- 
dom variables (random sets) Up^{X{-)) and Up^{X{-)) are statistically (stochas- 
tically) independent (as the random variable X may be of vector character, we 
can suppose that it is common for both the subjects in spite of perhaps different 
nature of their empirical data). 

For Boolean-valued basic probabilistic assignments and belief functions in- 
duced by them, Dempster combination rule can be rewritten in such a way that 
summations are routinely replaced by suprema and products by infima. Hence, 
for yBy-valued b.poss.a.’s tti, 7T2 on S we obtain that 

(tti © 7T2) (A) = Us.ccS. snC=A('^i(-6) bl 71‘2(C)) (4.4) 

for each A C S'. As can be easily proved, 

Uacs(^i ® 7T2) (A) = T, (4.5) 

so that 7Ti © 7T2 is also a Br-valued b.poss.a. on S. 

Keeping in mind the interpretation introduced in Chapter 1, we arrive at the 
set-valued mapping Up{-) : E -G V{S), where Up{x) = {s G S : p{s,x) = 1}. In 
order to introduce Boolean-valued uncertainty degrees into our model, consider a 
complete Boolean algebra B — {B, V, A, -■) and a B-valued complete possibilistic 
space (17, P(17), TTo). Hence, 17 is a nonempty set, P(17) is the power-set of all 
subsets of 17 and Uq is a ,B-valued complete possibilistic measure on 17, so that 
Uq takes P(17) into B in such a way that ilo(0) = 77o(17) = Ig, and 

no (UAe7?,A) = VAe7?,-^o(A) (4.6) 

for every nonempty system R of subsets of 17. Taking the empirical value x G E 
as that of a mapping A : 17 — >• A, the composed mapping Up{X{-)) takes 17 into 
V{S) so that, given A C S', we can define 

7t(A) = iTo ({w G 17 : Up{X{u;)) = A}) . (4.7) 

As can be easily proved, Vacs'^(^) = ^o(^) = Ig, so that tt is a ,B-valued basic 
possibilistic assignment S. If S is finite, the completeness of TTg is not necessary. 

Consider two subjects operating over the same empirical space E and possi- 
bilistic complete space (17, P(17), iTg), both using the same, possibly vector-like 
empirical value x G E taken by a variable X : f2 ^ E, but with perhaps different 



compatibility relations pi and p 2 - Let 




Pi2{s,x) = min{pi(s,x),p 2 (s,a;)}. 


(4.8) 


so that 




= Up^{x) C^Up^{x) 


(4.9) 
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holds for every x G E. Applying (4.7) we obtain that 

(tti © 7T2) (A) = 7Ti2(A) = (4-10) 

= y BnC=Ano ({w € fi : Ltpi (X(w)) = 5} fl {w G 12 : Up^{X{u})) = C}) . 

Hence, if Uq is a homeomorphism which takes the Boolean algebra Bq = 
{P{n),U, n, 12 — •) of all subsets of 12 on in such a way that 

TTo ((C/p,(A))-i (B) n (C/p,(A))-i (C)) = (4.11) 

= TTo {(UpAX))-^ (B)) A TTo {(UpAX))-^ (C)) 

holds for all TT, C C S', in particular, if 12 = T and TTo is the identity mapping, 
then 

(tti © 7T2) (A) = yBnC=Ai'^AB) A 7r2(C)), (4.12) 

introducing the ,B-valued b.poss.a.’s tti and 7T2 like as tt is defined by (4.7). Us- 
ing the terms similar to those in the classical probabilistic case, we can say that 
if (4.11) is valid, the set-valued mappings Up^{X{-)) and Up^{X{-)) are possi- 
bilistically independent. The notion of possibilistic independence is analyzed, 
compared with that of statistical (stochastical) independence, and discussed in 
more detail in [9]. 

Let us consider, again, a partially ordered set T = (T, ^), a nonempty set S, 
a yBr-b.poss.a. tt on S, the yBr-valued belief function BELj^ defined by (3.1), and 
the induced b.poss.a. tt* and belief function BEL^ defined by (3.4) and (3.6). 
Using a more or less routine way of reasoning (cf. Theorem 7.1 in [12] for details), 
we arrive at the following statement. 

Theorem 4-E Let tti, 7T2 be yB^-valued b.poss.a.’s, let their Dempster product 
7Ti ©7T2 be defined by (4.4). Let the Dempster product of the induced V{T)/ ^- 
valued b.poss.a.’s tt*, ttJ be defined by 

(tt* © T^l) (A) =df (tTi © 7T2)*(A) = [(tTi © 7T2) (A)] (4A3) 

for all A C S'. If T = (T, -<) is a complete upper semilattice and a lower semi- 
lattice, and if 

V(7ri(S) n 7T2(C')) = (V7 Ti(H)) a (V7T2(C')) (4A4) 

holds for each B, C C S, then the relation 

(^t © n;) (A) = UBnC=AK{B) n 7 t*(C)] (4.15) 

is valid for all A C S. □ 

Remark. The relation (4.15) can be obtained also when adapting (3.1) routinely 
to the case of partially ordered set {V{T)/ ~,U) with its supremum (U) and 
infimum (□) operations. Theorem 4.1 then explicitates the conditions under 
which such a formal rewriting is legitimate. The relation 



V(7Ti(H) n 7T2(C')) ^ (V7 Ti(H)) a (V7T2(C')) 



(4.16) 
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holds in general, so that the weakened version of (4.15), namely, 

(ttJ © 7T*) (A) C UBnC=A [<{B) n 7T*(C')] , (4.17) 

can be proved without the assumption (4.14). 

Let us introduce a particular case when (4.14) holds true. Let S and W 
be nonempty sets, let ttq : 7^(3) —>■ 'P(W) be a Boolean-valued b.poss.a., so 
that UyicS^o(^) = Set T = {V{W),C.) and define a mapping A: V{W) -A 
'P{V{W)) (= V{T)) in this way: for each Wq C W, 

\{Wq) = {{w} :w CV{W). (4.18) 

In particular, given A C S', set tt{A) = A(7To(A)) C V{W) = T. An easy rea- 
soning yield that \jA = \jA and /\A = f\A for every A C V{W). For the values 
7t{A) we obtain that V^{A) = 7To(A) for every A C S, At’‘(^) = 7>‘o(A), if tto{A) 
is a singleton, i. e., if 7 To(A) = {mo} for some Wq G W, and A7>‘(AI) = 0 otherwise, 
as {mil n {m 2 } = 0 for every mi yf m 2 , mi, m 2 G W. An easy calculation yields 
that 

Vacs’’’(^) = A(bF) = A (l^-p(iy)_c)) ) (4.19) 

as W is the unit element of the p.o. set (P(VF), c). Hence, in this sense tt is a 
P(T)-valued Boolean basic possibilistic assignment on S. 

Let us consider two P(bF)-valued b.poss.a.’s ttoi, 7To 2 on S, let tti, 7T2 : 
V{S) -A V{T) be defined by TTi{A) = A(7Toi(A)) for both i = 1,2 and for all 
A C S. Given B, C C S, we obtain that 

V(7Ti(H) n 7T2(C')) = 7 Toi(B) n 7 To2(C') = (V7Ti(H)) A (V7>'2(C')) , (4.20) 

so that (4.14) holds in this particular case. 

5 Nonspecificity Degrees and Dempster Rule 

Leaving aside a number of perhaps more important and more deeply going prob- 
lems concerning to the Dempster combination rule, let us focus our attention to 
the following quite legitimate question: whether, and in which sense and degree, 
the quality of a basic probability or basic possibilistic assignment is improved 
when combined with another such assignment? 

Let S' be a finite set, let m : P{S) -A [0, 1] be a basic probability assignment. 
The nonspeciGcity degree W (m) is defined by 

For two b.p.a.’s mi, m 2 we can prove (cf. [8] together with a detailed discussion 
including the intuition behind) that the inequality 



W{mi © m2) < IF(mi) A W{m2) 



(5.2) 
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holds, where A denotes the (standard) infimum in [0, 1]. For the dual Dempster 
rule 0, induced by the compatibility relation pi 2 (s, x) = pi(s, x)Vp 2 (s, x), where 
V stands for supremum in [0, 1], we obtain that 

W(mi (E> m2) > W(mi) V W(m2) (5.3) 

holds. These inequalities can be generalized to 

W\(mi Q m2) < W\(mi) A W\(m2), (5.4) 

W\(mi (g> m2) > W^(mi) V W\(m2), (5.5) 

where 

Wx{m) = 

and A is a fuzzy measure on S, i. e., A : V{S) -A [0, 1], A(0) = 0, A(S') = 1, and 
A(A) < \{B) for every A C B C S. 

The last approach can be shifted to the case of non-numerical b.poss.a.’s as 
follows. Let 7Ti, 7T2 be Boolean- valued basic possibilistic assignments defined on 
S and taking their values in V{T), let for every A C S 

(tTi © 7T2) (A) = Ub,CcS, BnC=A (^i(B)n7r2(C)), (5.7) 

(tti © 7T2) (A) = Us.ccs. buC=a{'^i{B) n 7r2(C)). (5.8) 

Let A : P{S) -A V{T) be a V{T)-valued Boolean fuzzy measure on S, i.e., 
A(0) = 0, A(5') = T, and A(A) c X(B) holds for each ^ c B c S'. The V{T)- 
valued Boolean nonspecificity degree W\{'k) of a Boolean- valued b.poss.a. tt with 
values in V{T) is then defined by 

W^AW = UAc5(A(A)n7r(Al)). (5.9) 

The next assertion more or less immediately follows (cf. [12] for the details of 
the proof). 

Theorem 5.1. For each Boolean- valued b.poss.a.’s tti, 7T2 on S the set inclusions 



© tt2) C W^{ni) n W^{tt2), (5.10) 

© tt2) D W^{tti) U W^{Tr2) (5.11) 

are valid. 

Lemma 2.1 immediately yields that, under the notation and conditions of 
Theorem 5.1, the relations 

© 7T2)] E [wI{t:i) n VF|(7T2)] , (5.12) 

[Wlini © 7T2)] □ [wI{t:i) U VF|(7T2)] (5.13) 

hold. If (T, -<) is a complete upper semilattice, then Theorem 2.1 (iv) yields that 

© 7T2)] □ [lF^'(7ri)] U [VF^^(7T2)] , (5.14) 
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if, moreover, (T, -<) is a lower semilattice, then Theorem 2.1 (vi) yields that 

[Wlini © 7T2)] E [WliiTi)] n [Wlin^)] (5.15) 

holds. Hence, under the conditions that (T, -<) is a complete upper semilattice 
and, simultaneously, a lower semilattice, the mapping [VF^(-)] seems to be a 
reasonable V{T)/ ^-valued nonspecificity degree of V{T)/ ~-valued basic pos- 
sibilistic assignments, copying in a reasonable and nontrivial way some intuitive 
and acceptable properties of the nonspecificity degrees W, W\ and W^. These 
conditions imposed to the partially ordered set (T, seem to be the weakest 
ones under which such a modification is possible and nontrivial. 

6 Conclusions 

When considering some possibilities of applications of non-numerical uncertainty 
degrees in general, and non-numerical basic possibilistic assignments and belief 
functions in particular, we can modify the basic paradigma used in the case of 
probabilistically quantified and processed degrees of uncertainty. In this case, 
elementary random events, mutually disjoint and defining a composition of the 
certain event, are supposed to be endowed by non-negative probability values 
summing to one. The assumption of additivity or cr-additivity, together with 
the assumption of statistical (stochastical) independence of at least some ran- 
dom events if they occur repeatedly, enable to compute probabilities for large 
collection of random events defining a very rich structure. 

In the case of non-numerically quantified uncertainties we can start from a 
structure of events the degrees of uncertainty of at least some of them can be 
compared by the relation “greater than” or “greater than or equal to” . The de- 
grees of uncertainty of some events can be taken, by a subject, as acceptable 
as far as the risk following when taking them as surely valid is concerned, some 
other degrees of uncertainty are taken as too great to accept the same decision. 
In both the cases the subject’s feelings are immediate, not being based on some 
numerical evaluations of these degrees of uncertainty by real numbers, in partic- 
ular those from the unit interval. The events, when taken as sets, are structured 
by the relation of set-theoretical inclusion, perhaps with some more demands 
imposed to this structure, their degrees of uncertainty are structured by a par- 
tially ordering relations, and the aim is to compute the degree of uncertainty of 
some more sophistically defined events. Here “to compute” means to prove that 
the uncertainty degrees of these more complex events are comparable with those 
ascribed either to the elementary events supposed to be known a priori, or with 
degrees of uncertainty of events for which such a comparison has been already 
proved. 

In particular, we can process, in this way, the non-numerical uncertainties 
ascribed to the events like “the actual state of the system in question is in an 
investigated subset of S'”, demanding answers of this kind: “the degree of un- 
certainty of this event is at least as great as the degree of uncertainty ascribed 
to an event H”, or “the degree of uncertainty of this event is smaller than the 
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degree of uncertainty ascribed to an event i?” , in both the cases A and B being 
events from the elementary basis so that the subject can take profit of the uncer- 
tainty degrees ascribed to them, in her/his decision making, thanks to her/his 
knowledge concerning the practical and extra-mathematical circumstances of the 
system and decision-making problem under consideration. E. g., a solution to a 
problem may be taken as good and fail-proof if we know that the uncertainty 
describing the possibility of its failure is not greater than the danger of a strong 
earthquake in our region, even if we perhaps do not know the precise probability 
value of the occurrence of the last catastrophe. 

At least the three following problems or directions of further investigation 
would deserve being taken into consideration. 

(I) We have chosen, in this paper, a rather general approach when degrees of 
uncertainty are subsets of a partially ordered set. Consequently, the set of uncer- 
tainty degrees can be endowed by two structures: the Boolean one, generated on 
the power-set V(T) of the partially ordered set (T, -<} by the usual set-theoretic 
operations and relations (e. g., C, fl, U), and the relations and operations de- 
fined through the partial ordering relation ^ on T (e. g., □,□,□). A question 
arises whether it is possible to obtain a similar model either with single-valued 
uncertainties, even if from a larger set than T, or with set-valued uncertainties, 
but structured only by usual set-theoretic operations and relations. 

(II) In the author’s opinion, the conditions imposed, in this paper, to the 
structure of the set of uncertainty degrees seem to be the weakest ones under 
which a non-trivial fragment of the theory of belief functions can be built up. 
Nevertheless, this conjecture should be re-written in a more formalized way to 
be either proved or rejected. 

(III) It would be interesting and perhaps useful to seek for a non-artificial 
and rather practical structure of events charged by uncertainty such that this 
structure would meet the demands imposed in this paper, but would not meet 
some stronger demands requested by, say, probabilistic models of decision making 
under uncertainty. 

Let us hope that at least some of these problems will be touched by a further 
investigative effort. 

The items [4] and [16] listed below may serve as good sources of elementary 
knowledge concerning Boolean algebras, partial orderings and related structures. 
The monographs [6] and [14] then provide the basic pieces of information con- 
cerning measure theory in general and probability theory in particular, both 
in their most abstract and mathematically formalized settings. [15] represents 
one of the pioneering monograph in Dempster-Shafer theory of belief functions. 
Some more references, thematically very close to the subject of this paper, are 
also listed below. 
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Abstract. This paper is a self-contained presentation of a method for 
combining several belief functions on a common frame that is different 
from a mere application of Dempster’s rule. All the necessary results 
and their proofs are presented in the paper. It begins with a review and 
explanation of concepts related to the notion of non-normalized mass- 
function, or gem-function, introduced by P. Smets under the name basic 
belief assignment [1,6]. Then the link with Dempster’s rule of combina- 
tion is established. Several results in relation with the notion of Dempster 
specialization matrix are proved for the first time [2]. Based on these re- 
sults, the method is then presented and a small application is considered. 



1 Mass-Functions and Gem-Functions 



In the Dempster-Shafer Theory of Evidence, it is well-known that a belief func- 
tion on a finite frame 0 can be equivalently represented by its plausibility func- 
tion, its commonality function or its basic probability assignment [4,5]. In this 
paper, the basic probability assignment, also called mass-function, will be used 
to represent a belief function. A mass-function is a mapping 



m : P{0) [0, 1] 

satisfying the two conditions 



m(0) = 0 

{m{A) : A C 0} = 1. 

This notion of mass-function has been generalized by P. Smets by allowing the 
possibility to assign a positive mass to the empty set, which leads to the concepts 
of basic belief assignment and non-normalized belief function [1]. In this paper, 
such a generalized mass-function will be called a gem-function : 

Definition 1. A gem- function g on a frame 0 is a mapping 



g:r{0)^ [0,1] 



( 1 ) 



such that 

y^{5(A):AC0} = l. (2) 
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The difference with a mass-function is that a gem-function may assign a positive 
mass to the empty set, which is not possible for a mass- function. Of course, every 
mass-function is a gem-function but the converse is false. Also, a gem-function 
is called proper if the value assigned to the empty set is strictly smaller than 1. 
Note that a proper gem-function g can be transformed into a mass-function m 
by normalization : 



m{A) 



9(A) 

1 - 3 ( 0 ) 

0 



if AC 0 
if A = %. 



The commonality function associated with a belief function or its mass-function 
m is the mapping 

q : V{0) [0, 1] (3) 



given by 

9(^) = w(S). (4) 

BDA 



Similarly, for a gem-function g, we define the commonality function of g as being 
the mapping 

q : no) [0, 1] (5) 



given by 

q{A) = ^ g{B). (6) 

BDA 



The commonality function is also called the g-function. Obviously, the definitions 
given in equation (4) and (6) coincide if the gem-function g happens to be a 
mass-function. 

Now recall the definition of the Dempster’s rule of combination in terms of 
the mass-functions of the belief functions that are being combined : 



Definition 2. (Dempster’s rule) Let Beh, . . . , be a family of belief func- 
tions on the frame 0. If mi denotes the mass-function of Bek, then the combined 
belief function exists if the value 



k = Y {II ^ = 0 } 



is strictly smaller than 1. If the combined belief function exists, then it is denoted 
by 

Bel = Bell 0 • ■ • 0 Bel^, 



and its mass-function is 

( S {rii=i ■ Ai C 0, = A} 

m[n) — - - 



for all non-empty subsets AC 0 and of course m(0) = 0. 
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P. Smets [3] considers the combination of several gem-functions defined on 
a frame O. Since the present paper is placed in the classical framework of the 
Dempster-Shafer Theory of Evidence, the combination of gem-functions is con- 
sidered here only as a technical tool, whereas for Smets it is an essential compo- 
nent of his Transferable Belief Model with its own meaning and interpretation. 
For this reason, we don’t speak of the combination of several gem- functions, 
but we rather talk about the gem-function associated with a collection of gem- 
functions. 

Definition 3. Let 

C = {gi,...,gn} 

be a collection of gem- functions on 0 that are not necessarily different, i.e. 
some gem-functions may appear several times in the collection. The gem-function 
associated with the collection C is the mapping 

5 : P(0) ^ [0, 1] 



given by 

n 

g{A) = Y, ([{gM,) : A, c 0,c^UA. = 

i=l 

for all subsets A of 0. The gem-function g associated with C is denoted by 
gem{C) and to simplify the notation we simply write gem{gi, . . . ,gn) instead 
of gem{{gi,...,gn}). 

The mapping gem{C) clearly satisfies conditions (1) and (2) and does not 
depend on the order in which the elements of C are considered. As the following 
definition shows, this allows us to speak about the gem-function associated with 
a collection of belief functions because every mass- function is a gem- function. 

Definition 4. For i = let rrii denote the mass-function of the belief 

function Bek on 0. Then the gem-function associated with the collection of 
belief functions 

B = {Bell, ■ • • , Beln} 
is 

gem{B) = gem(jni , . . . , rrin) ■ 

By definition of the Dempster’s rule of combination (see definition 2), the com- 
bined belief function 

Bel — Bell 0 • • ■ 0 Beln 



exists if 

gem{B){%) < 1, 

in which case its mass-function is obtained by normalization of the gem-function 
gem{B). Therefore, to find a way of combining several belief functions that is 
different from a direct application of the definition of Dempster’s rule, we need 
to find a method for computing gem{C) that is different from its definition. For 
this purpose, the notion of Dempster specialization matrix is considered in the 
next section. 
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2 Dempster Specialization Matrices 

From now on, let 

{B\, B 2 , . . . , Bn} 

denote the set of all subsets of 0 and it is assumed that the following conditions 
are satisfied : 

i?i = 0 and {Bi Q Bj i < j). (7) 

Note that it is always possible to find such an ordering of the subsets of 0. A 
gem-function g is then completely specified by a column vector 

9= {g{Bi),...,g{Bn)y 

and, similarly, a commonality function q is completely specified by a column 
vector 

q = iq{Bi), . . .,q{Bn))', 

where the prime denotes the transpose of the vector. The following notion of 
Dempster specialization matrix introduced by Klawonn and Smets [2] will be 
useful in the sequel. 

Definition 5. Let g he a gem-function on 0. Then the Dempster specialization 
matrix of g is the square matrix S of order n given by 

Sij = :KC0,KnBj= B,} 

for all i and j in {1, ... ,n}. 

If U = Bi and V = Bj, then the element Sij of the matrix S is also denoted by 
S{U,V) = { 9 (K) ■. K nv = U}. 

Also, in order to keep the notation as simple as possible, it will be sometimes 
useful to simply write i instead of Bi and j instead of Bj. With this convention, 
the equation 

Sij = ^ {g{k) : k C 0,knj = ij 

still makes sense. 

Basically, a special case of the following result is mentioned in Klawonn and 
Smets [2], but unfortunately no proof is given there. In the form given below, 
the following theorem is stated and proved here for the first time. 

Theorem 1. Let {gi , . . . , be a collection of gem-functions on 0. Lf S 

denotes the Dempster specialization matrix of gn+i, then 



gem{gi , . . . ,g„+i) = S ■ gem{gi , . . . , 5 „). 
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Proof. For fc > 0, let V{0)^ denote the cartesian product of k copies of 'P{0), 
i.e. an element of V{0)^ is a fc-dimensional vector whose components are subsets 
of 0. Let A be a fixed subset of 0. Then we define the set 

M = {{A,,..., A„+i) G iP(0)"+i : = A} 



and for a subset L oi 0 let 



= {(Ai, . . . , A„) G V{0T ■■ = L} 



and 

Vl = {K &V{0) ■.Kf^L = A} 



and 



TL=Ury. Vl 



and 

at = U : L C 0}. 
First we prove that M = M. Indeed, let 

X = {Ai , . . . , G Ai. 

If we define Aq = Cif^iAi, then we show that 



X G A Ao — kiAo X Vylo : 

which proves that x G Af. But (Ai, . . . , An) G Kaoj £^nd An+i G Vaq because 

An+i nAo = n^,^A, = A 

since x G Ai. This implies that x G Af and hence Ai c M. 

Conversely, let 

X = (Ai, . . . , An+l) G Af ■ 

Then there exists Aq G_ O such that {Ai, . . . , A„) G Uaq and An+\ G Vao- But 
then 

^1=1 = {Af^iAi) n An+i = Ao n An+l = A, 

which shows that x G Ai and hence Af Q Ai, which finally implies that Ai = Af . 

Obviously, the union in the definition of Af is a union of disjoint subsets 
because if L yf L', then 



[Ul X Vl) n {Ul' X VlO = 0. 
The set Tl can be written as the disjoint union 

= U {Ul x[K}-.Kg Vl} 



and so if 



Qk = Ul y. {K} 
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then 

Tl = U{Gk-K€ Vl}. 



Now, let 

g* = gem{gi,...,gn+i) and g = gem{gi, . . . , gn). 

Then, since M = Af, we can write 

n+1 n+1 

9*(A)= E E iI[9.{A^))■ 

{Ai,...,A„+i)eM i=l (Ai,...,A„+i)eAf i=l 

Then 

n+1 

+(+ = E( E 

LQ0 [Ai,...,A-n+i)€iJ'L 

n 

= E(E( E {gn+l{K)l[gM^)))) 

LQO K^Vl (Ai,...,An,K)^QK *— 1 

n 

= E(E(ff"+iW( E 

Z/C0 (Ai 1 



But this implies 

9*{A) = E( E {9n+i{K)g{L))) = E(++( E 5n+i+))) 

LC0 KgVl LC0 KgVl 

= J2{g{L)S{A,L))=J2{S{A,L)g{L)), 

L<Z0 LC0 

which means that the value of g* on A is obtained by multiplying the row of the 
matrix S corresponding to A with the column vector g. But this simply means 
that g* = S ■ g and the theorem is proved. <C> 



3 A Representation of the Dempster Specialization 
Matrix 



In this section it will be shown that the Dempster specialization matrix can be 
diagonalized. 

Definition 6. The incidence matrix of the ordering is the square 

matrix M of order n given by 



Mij — 



1 if Bi<Z Bj 
0 otherwise 



for all i and j in {1, ... ,n}. 
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The matrix M is an upper-triangular matrix because if i > j then Mij = 0 
because Bi 2 Bj by the second condition in (7). Also, M is a regular matrix 
because Mu = 1 for all i = 1, ... ,n. Now let g be a gem- function given by the 
column vector 

9= {9{Bi),...,g{Bn))' . 

If the commonality function of g is given by the column vector 

q = {q{Bi), . . .,q{Bn)Y, 



then we obviously have 



Mg = q. 



( 8 ) 



Definition 7. Let q denote the commonality function of a gem-function g on 
the frame O. Then the commonality matrix of g is the square matrix Q of order 
n given by 



Qij 



q{B,) if i = j 
0 otherwise 



for all i and j in {1, ... ,n}. 



The following theorem states that the Dempster specialization matrix of a gem- 
function is diagonalizable. This result is mentioned in Klawonn and Smets [2], but 
unfortunately no proof is given there. A proof of this interesting and important 
result is published here for the first time. 



Theorem 2. Let S denote the Dempster specialization matrix and Q the com- 
monality matrix of a gem-function g on the frame 0. If M is the incidence 
matrix, then 

S = M~^QM. 



Proof. Let us define the matrices A = MS and C = QM and show that A = C, 
which will prove the theorem. But, using the notation convention explained in 
the previous section, we can write 

A] = X! : fc C 6>} = ^ {Skj :kAi} (9) 

= XI { E ^9{t) :t(lj = k}):kAi}. (10) 

Now suppose first that i % j. Then there is an element a; in 0 such that x G i 
and X ^ j. But i Qk implies that x G k, and tDj = k implies that k C j. On the 
other hand, since x G k and x ^ j, it follows that k 2 jt which is a contradiction 
to fc C j. This shows that the double sum in equation (10) is empty when i ^ j, 
which implies that Aij = 0 when i % j. 

Now suppose that i C j. Then by equations (9) and (10) 

Aij = ^ {g{f) \ kAi,tA j = k}. 




Dempster Specialization Matrices and the Combination of Belief Functions 



323 



Now we define the sets 



E = {{k,t) : k ^ = k} 

and 

F = {{k,t) :iCkCjjt = kUx,xC j°} 

and show that E = F. Indeed, if (k,t) G E, then k ^ i and tDj = k, which 
implies that i ^ k Q j . But k Q j and tnj = k implies that t = kU x for some 
X C which shows that (k,t) € F. Now suppose that (k,t) € F. Then 

t n j = (fc u x) n j = (fc n j) u (x n j) = {kr\j) = k 

because x C and k C j, which shows that (k,t) G E. Therefore 
Aij = ^ {g{t) ■■ i C k C j,t = kUx,x C f} 

= X! Dx):iCkCj,xC f}. 

We define the sets 

U = {kUx: iCkCj^xC j‘^} 

and V = {I : I A i} and show that U = V. Indeed, if fc U a: is in U, then 
i C k C kU X and hence kU x A i, which shows that fc U a; € V. Conversely, let 
I G V, i.e. I D z, and define 



k = lC\ j and x = iD j°. 



Then 



l = ln (juf) = (inj) u{inf) = kux 



and, in order to show that I is in U, we must prove 



1. i C k, i.e. i C I n j, which is true because i Q I, and t C j by the general 
hypothesis. 

2. k Q j, i.e. I r\ j Q j, which is clearly true. 

3. a; C j'=, i.e. 1(1 j‘^ C j‘^, which is also true. 



But U = V implies that 



= XI { 5(0 ■lAi} = q{i) 



and therefore 



On the other hand. 



Aij — 



d{i) 

0 



if iQj 

otherwise. 



Crj = X {Q^kMkj : fc C 0} 



Qii if iQ j 

0 otherwise, 



which means that 



C^j = 



h{i) 

0 



if iQj 

otherwise. 



This shows that Aij = Cij for all i and j and hence A = C, which proves the 
theorem. 
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4 Combining Several Belief Functions 

In this section a method for computing the combination of several belief functions 
is presented. This method is different from the mere application of the definition 
of Dempster’s rule. 

Theorem 3. For i = 1, . . . ,k, let Qi denote the commonality matrix of a gem- 
function gi on the frame O. If M denotes the incidence matrix, then 

k-l 

gem{gi,. . . ,gk) = Qi)Mgk- 

Proof. This theorem is proved by induction on k. For k = we have 

gem{gi) = gi = M~^IMgi, 

which proves the result when k = 1. 

Now the induction step {k — 1) — >■ k is proved. If S denotes the Dempster 
specialization matrix of gk, then 

gem{gi, ...,gk) = S- gem{gi, . . .,gk-i) 

by theorem 1. Then the induction hypothesis implies that 



k-2 

gem{gi, . . .,gk-i) = Qi)Mgk-i 



and hence 



k-2 

gem{gi,. . . ,gk) = SM~^(Y[Qz)Mgk-i. 

2 = 1 



But by theorem 2 



s = M~^QkM 



and therefore 



k-2 k-2 

gem{gi,. ■■,gk) = Qi)Mgk-i = Qi)QkMgk-i 

i=l i=l 

because Qk and Qi, . . . ,Qk~i are diagonal matrices and hence their product 
commute. If 

e=(l,. 

denotes the column vector composed of ones only and if denotes the common- 
ality function of the gem- function gi, then 



Mgi = Qi = QiC 




Dempster Specialization Matrices and the Combination of Belief Functions 



325 



for all z = 1, . . . ,k. Then, we can write 



k—2 k 

gem{gi,...,gk) = Qi)QkQk-ie = M 

k-l k-1 

= M“^( Qi)Qke = Qi)qk 

k-l 

i=l 



which proves the theorem. <0> 

This result can be used to compute the combination of several belief functions 
on the same frame 0. 

Corollary 1. For i = 1, . . . ,k, letrrii denote the mass-function of a belief func- 
tion Beli on the frame 0 and let Qi denote the commonality matrix of rrii. 
Furthermore, let M denote the incidence matrix. If the combined belief function 



Bel = Bell © ■ • ■ © Belk 



exists, then its mass-function is obtained by normalization of the gem-function 

k-l 

g* = M-^il[Q,)Mmk (11) 



Proof. This is a direct consequence of theorem 3. <0> 



If gk denotes the commonality vector of mk, then the gem-function g* in (11) 
can also be written as 



fc-i 



= ^ ^(H 



(12) 



because 



Mmk = qk- 



As a special case, this result can be applied to the situation where several copies 
of the same belief function must be combined by Dempster’s rule. 



Corollary 2. Let m denote the mass-function of a belief function Bel on 0 
and let Q denote the commonality matrix of m. Furthermore, let M denote the 
incidence matrix and define Beli = Bel for all i = l,...,k. If the combined 
belief function 

Bel* = Bell © ... © Beh 

exists, then its mass-function is obtained by normalization of the gem-function 

g* = 
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Proof. This result is a direct consequence of corollary 1. 

In addition, if q denotes the commonality vector of m, then 

g* = 



according to equation (12). 



5 Application 

As a simple application of corollary 2, we consider the combination of k copies of 
a same belief function Bel on a frame 0 = {6*i, 02}- The ordering of the subsets 
of 0 is taken to be 



Bi = 0 , 52 = {0i}, 53 = {62}, 54 = 0. 

The belief function Bel on 0 is specified by its mass-function m given by the 
column vector 

m= {Q, p, q, r) 

with p + q + r = 1. The commonality vector g of m is 

q= { I, p + r, q + r, r)' 



and the commonality matrix of m is 



Q = 



/ 1 0 0 0 

0 p+r 0 0 

0 0 q+r 0 

V 0 0 0 r 



The incidence matrix M is 



M = 



/ 1 1 1 1 
0 10 1 
0 0 11 
\ 0 0 0 1 



and its inverse matrix is 



M~^ = 



/ 1 -1 -1 1 ' 
0 10-1 
0 0 1-1 
V 0 0 0 1. 



Let Bel* denote the combination of k copies of Bel by Dempster’s rule of com- 
bination, i.e. 



Bel* = Bel © ... © Bel {k terms). 
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Then the mass-function m* of Bel* is obtained by normalization of the gem- 
function 

g* = 

= [l-{p + rf -{q + rf + (p + r)'=-r^ {q + rf-r'^, r'=)', 
which yields 

m* = K~^ ( 0, (p + rf - r^ {q + r)'' - )' 

where 

K={p + rf + {q + rf -r’^. 
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Abstract. The interpretations of belief functins and their relationships 
with other uncertainty theories have been widely debated in the liter- 
ature. Focusing on the interpretation of belief functions based on non- 
negative masses, in this paper we provide a contribution to this topic 
by addressing two questions concerning the relationships between belief 
functions and coherent lower probabilities. The answers we provide to 
both questions tend to exclude the existence of intuitively appreciable 
relationships between the two theories, under the above mentioned inter- 
pretation. While this may be regarded as a confirmation of the conceptual 
autonomy of belief functions, we also propose future research about an 
alternative characterization, based on the notion of independence. 



1 Introduction 

The conceptual status of belief functions with respect to other uncertainty theo- 
ries has been extensively debated in the literature. This paper aims at providing 
a contribution to this topic, by considering the relationships with the theory of 
coherent lower probabilities [21], which encompasses belief functions as a special 
case. In particular, we focus on a distinguishing property of belief functions, 
namely non-negative masses. We analyze some examples and provide results 
showing that non-negative masses can hardly be given a meaningful interpreta- 
tion in this context. On one hand, this result may be regarded as a confirmation 
of the absence of conceptual liaisons between belief functions and probability- 
based theories, as advocated for instance in [13], on the other hand, it evidences 
the opportunity of exploring alternative characterizations of belief functions. 

The paper is organized as follows. In section 2 we recall some basic aspects 
of the original version of belief functions theory, pointing out the difficulties 
in its conceptual characterization. Section 3 briefly surveys interpretations and 
debates concerning Shafer’s proposal, while section 4 analyzes its evolution in 
the Transferable Belief Model. In section 5 we pose two questions about the 
possibility of ascribing a meaningful interpretation to belief functions in the 
context of coherent imprecise probabilities and give a negative answer. Finally, 
in section 6 we summarize the results and point out future research directions. 
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2 Shafer’s Theory 

In Chapter 1 of [10] it is stated that, given a finite set 17 and its powerset 2^, 
if a function Bel : 2^ — >■ [0,1] satisfies the following conditions: i) Bel{^) = 0; 
ii) Bel{[2) = 1; hi) Bel{Ai U . . . U An) > Yl,i Bel{Ai) - J2i<j Bel{A^ n Aj) + 
■ ■ ■ + Bel{Ai n . . . n An)', then Bel is called a belief function over 17. 

A different conceptual framework is given in Chapter 2 of [10], where Shafer 
defines a basic probability assignment as a function m : 2^ — >■ [0, 1] such that 
m(0) = 0; values of the function m are supposed to 

measure probability masses associated with subsets. It is then stated that a 
function Bel : 2^ — >■ [0, 1] is a belief function if, for some basic probability 
assignment m : 2^ — >• [0, 1], it is given by Bel{A) = fn{B). 

The duplication of belief function definition in Shafer’s book indicates an 
ambiguity in its foundational part, which is reflected in subsequent presentations 
of the theory by other authors. For instance, in [5] belief functions are defined 
using the properties i)-ih). On the other hand, Smets [12] qualifies the definition 
based on these properties as unnatural, and defines belief functions starting from 
a slightly modified notion of mass function. 

Later in his book, Shafer introduces another primitive concept, namely the 
weight of evidence, which represents a further source of complication about the 
conceptual foundations of belief functions. A weight of evidence is a real number 
in [0, oo]: it is assumed that there is a relationship between weights associated 
with evidence items and belief degrees derived from them. In order to charac- 
terize this relationship, Shafer starts with the case of a simple support function, 
namely a belief function corresponding to a mass function assigning a non-zero 
mass value m{A) to just one proper subset A of 17 and a mass value l — m{A) to 
17. A sequence of progressively more expressive classes of belief functions is then 
introduced. Each new class is provided with a characterization and an intuitive 
justification in terms of the previously defined ones. 

The first idea is that of combining simple support functions using Demp- 
ster’s rule: belief functions obtained by combining two or more simple support 
functions are called separable support functions. The dual process with respect 
to combination is decomposition, which consists of recovering from a separable 
support function a set of simple support functions generating it. Decomposition 
is, in general, not unique, however a single canonical decomposition can be iden- 
tified by imposing some conditions on the simple support functions obtained. 

The subsequent class of belief functions is related to the operations of coars- 
ening and refinement: support functions are the belief functions that can be 
obtained from separable support functions by coarsening the frame of discern- 
ment. Shafer explicitly states that ’’they seem to constitute the subclass of belief 
functions appropriate for the representation of evidence”. However, there exist 
belief functions which are not support functions: they are called quasi support 
functions and can be interpreted as limits of sequences of separable support 
functions or as restrictions of such limits. This interpretation, as Shafer admits, 
hardly has an intuitive counterpart: in summary, his book does not provide a 
satisfactory intuitive interpretation covering the whole set of belief functions. 
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3 Interpretations and Criticisms of Shafer’s Theory 

Several proposals have been formulated in order to provide a meaningful interpre- 
tation of belief function theory. Some of them rely on an underlying probabilistic 
model, such as the original notion of upper and lower probabilities introduced 
by Dempster [2], random codes [11], random sets [7], probabilities of provability 
[8]. An extension of evidence theory to the case of infinite frames of discernment, 
called hint theory, has been proposed in [6], where it is shown that considering 
inner and outer probability measures induced by a probability measure which, in 
hint theory, roughly corresponds to a basic probability assignment, one obtains 
an extended notion of belief and plausibility functions, which preserves some 
important properties, such as monotony of order oo. A different kind of relation- 
ship between inner and outer probability measures and belief and plausibility 
functions for finite sets is pointed out in [3]. All these interpretations share the 
assumption that the notion of probability is given for granted. 

Reasons for not sharing this assumption are given in several works (e.g. 
[14] [15]) proposing an alternative interpretation (and extension) of Shafer’s 
theory: the Transferable Belief Model (TBM), which rejects any relationship 
with probability theory and will be discussed more extensively in next section. 
Another problem in the relationship between belief functions and probability 
has been pointed out in [23], where it is shown that some of the postulates 
introduced in [10] about the notions of chance, belief, weight of evidence, and 
evidence combination are altogether inconsistent. According to [23], the most 
reasonable solution to this inconsistency consists of rejecting Shafer’s postulate 
that belief functions coincide with frequency limits, when they exist. 

Leaving apart interpretation, Dempster’s rule and its behavior are probably 
the most extensively debated issue in this theory. A large corpus of literature 
exists, including many examples of supposedly counterintuitive results produced 
by Dempster’s rule and the relevant answers and/or counterexamples (see for 
instance [24] [9] [20] [14]). As a matter of fact, most of these works compare 
belief functions theory with precise probability theory and are focused on the 
so-called dynamic part of the model, involving conditioning and combination 
rules, while we are interested in the relationship with imprecise probabilities 
and in characterizing the static part of the model, namely the properties of 
belief functions as a representation formalism. 

4 The Transferable Belief Model 

Probably the most comprehensive attempt to provide a solid justification for the 
use of belief functions, both at the theoretical and intuitive level, is represented 
by the Transferable Belief Model (TBM) proposed by Smets and coauthors [12] 
[18]. Three complementary justifications have been proposed for this approach: 
i) axiomatic justification of belief functions representation; ii) axiomatic justifi- 
cation of Dempster’s rule; iii) intuitive justification based on positive masses. 

As far as point i) is concerned, in [17] a set of axioms is presented, which, how- 
ever, is not completely satisfactory. In fact, the set of requirements is relatively 
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large (actually eleven) and, most importantly, some of them are questionable. For 
instance, requirement 3 states that ’’Probability functions are credibility func- 
tions”, however this contrasts with previous statements of Smets himself such 
as ’’the transferable belief model is built without ever introducing explicitly or 
implicitly any concept of probability” [13]. 

Turning to point ii), 8 axioms are used to prove the uniqueness of Dempster’s 
rule of combination in [13]. Among the axioms, one requires that there are at least 
three elementary propositions (which is surely true in most cases, but sounds 
peculiar as a requirement). Another axiom, called autofunctionality, is provided 
without justification. Moreover, the fourth axiom postulates Dempster’s rule 
of conditioning in order to derive Dempster’s rule of combination. It is however 
worth remarking that Shafer regarded the notion of conditioning as substantially 
extraneous to his theory: after presenting Dempster’s rule of conditioning, he says 
’’since new evidence rarely occurs in the form of a certainty, these formulas are 
of little practical value” [10]. 

Finally point iii) is strictly related to the subject of this paper. It is postulated 
that there exists some finite amount of belief which is spread among the various 
propositions (the subsets of 17) according to the available evidence [12]. In other 
words, evidence is assumed to be directly represented by mass functions: this is 
not in accordance with the original Shafer’s view, as explained in section 2. An 
attempt to reconcile TBM with Shafer’s notions of simple support function and 
canonical decomposition is proposed in [16] where the definition of simple sup- 
port function is generalized to the cases where a negative mass value is assigned 
to a proper subset A of 17. A simple support function with negative mass is sup- 
posed to represent a reason not to believe that the current state of the world is in 
A. It is then possible to define a generalized canonical decomposition of any non- 
dogmatic belief function (i.e. any belief function with m(17) > 0) into generalized 
simple support functions. However, this proposal leaves some conceptual gaps 
open. In fact, in order to take into account the ’’reasons not to believe”, a ’’la- 
tent belief structure” is introduced, which includes a confidence component and 
a diffidence component, both represented by belief functions. Then an apparent 
belief structure, which is again a belief function, is derived from the latent belief 
structure. The apparent belief structure is assumed to be the belief representa- 
tion usually adopted in the TBM. However, as Smets points out, there are some 
latent belief structures that can not be represented by apparent belief structures, 
since the latter ’’are not rich enough to characterize every belief state”. More- 
over it is possible to point out that there are cases where the decomposition into 
generalized simple support functions does not seem to fit the interpretation of 
’’reasons not to believe”. Consider the following example with 17 = {a, 6, c} and 
the mass assignment m{{a}) = m{{b}) = m({c}) = m{ff) = 0.25. By applying 
the canonical decomposition proposed in [16], it easy to see that in this case one 
of the resulting generalized simple support functions assigns a negative mass to 
the empty set. The idea that there can be some evidence that gives reasons not 
to believe in the empty set is somehow difficult to accept from an intuitive point 
of view. 
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5 Belief Functions vs. Coherent Lower Probabilities 

For the reasons explained in previous section, non-negative masses should be 
regarded as the intuitively most acceptable justification of the TBM framework. 
As well known, the mass function is biunivocally related with the corresponding 
belief function through the so-called Mobius inversion. Assuming non-negative 
masses entails that belief functions are capacities of order oo: it has been ob- 
served that functions with weaker properties can also be considered for describing 
a state of partial information [1]. In particular, a theory of coherent imprecise 
probabilities has been developed in [21]. It is assumed that the following conju- 
gacy relation holds between the lower (P) and the upper (P) probability of an 
event E-. P{E) = 1 — (P(-iP)). This enables one to consider lower probabilities 
only, whenever they are defined on a set of events closed under complementation. 

Coherent lower probabilities(CLP) are defined as a special case of lower pre- 
visions in 2.5.1 (see also 2.7) of [21]: given an arbitrary (finite or not) set of 
events S, P(-) is a coherent lower probability on S iff, 

\/m,\/Eo , . . . , Em G S, \/si > 0,1 = 0,. ..,m, defining I{E) as the indicator of 
E (I{E) = 1 if P is true, I{E) = 0 if P is false) and putting G = ~ 

P(Pi)] — so[^(Ao) — P(Po)] it is true that max{G) > 0. 

Coherent lower probabilities have a clear behavioral interpretation [21] and 
encompass several existing theories, including belief functions, as special cases 
[22] . In the sequel we shall exploit the characterization of CLP via envelope the- 
orem, which assures, in particular, that the lower envelope of a family of precise 
probabilities is a coherent lower probability, thus giving a useful practical way to 
construct imprecise probabilities. Examples of CLP whose Mobius inverse yields 
negative mass values are provided in [21] [22]. As to our knowledge, little work 
has been done so far to investigate whether it is possible to characterize belief 
functions as a class of CLP, with intuitively appreciable features. Moreover, it 
is worth noting that belief functions have been most often compared with pre- 
cise probabilities, where the most evident difference concerns the representation 
of ignorance, or with classes of so called upper and lower probabilities, where 
the existence of an ill-known precise probability is somehow assumed, which is 
instead rejected in the TBM approach [15]. In Walley’s approach lower probabili- 
ties are not assumed to be an approximation of an underlying precise probability 
[22], therefore they do not seem to contrast with the basic assumptions of the 
TBM. Similarly, coherent imprecise probabilities are conceptually distinct from 
inner and outer measures mentioned in section 3, which represent the extension 
of a precise probability measure to nonmeasurable subsets. For these reasons, 
it seems that the relationships between CLP and TBM deserve some further 
investigation, that we start in this work, by examining the following questions: 

1. Can mass sign be ascribed an intuitive meaning in the context of CLP ? 

2. Independently of mass sign interpretation, do belief functions have distinct 
properties as a subclass of CLP ? 

In order to provide a first answer to these questions, we restrict our analysis to 
a frame of discernment with three elements, namely we assume f2 = {ei, 62, 63}. 
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Even in this limited framework, it is easy to find examples of CLP where m(C) < 
0. We note that, for | C |= 3, m(l7) has the following optimal lower bound: 

Proposition 1 Let [2 he a set with | 17 |= 3, P(-) a coherent lower probability 
defined on 2^ , and m the mass function on 2^ , obtained from If by Mobius 
inversion, then m(f2) > —1/2. 



By Mobius inversion we have that 



3 

m(C) = 1 - P(ei V 62) - P(ei V 63) - P(e2 V 63) + ^ Pfef) . (1) 



Applying the coniugacy relation P(ei V 62) = 1 — and the analogous 

ones we obtain: 



i(l7) — —2 + '^^{Pfej) + P{ei)) . 



(2) 



In order to minimize (2) let us recall the following necessary condition for 
coherence (see 2.7.4 of [21]): 

1 - P{ei) < P{ej) + P(efc), i j k . (3) 

By summing the three inequalities corresponding to (3) we obtain: 

3 3 

3-^P(e.)<^2P(ei) . (4) 









By exploiting (4) we obtain 

^(P(e.)+P(ei))>3/2+^^^A^ . (5) 

i=l 

We can derive the minimum value for the left part of (5) by assuming the 
equality and putting P(ci) = 0, Vz: it can be easily seen that this entails P(ei) = 
1/2, Vz. This is a coherent imprecise probability assignment, which gives the 
actual minimal value of equation (2), yielding m(l7) = —2 + 3/2 = —1/2. 



Finding an optimal lower bound for the mass function with j 17 |> 3 is more 
complex. It is however significant to examine cases of imprecise probability as- 
signments featuring a similar structure with a different cardinality of 17. We con- 
sider therefore the lower probability assignments on a generic 17 = {ei, . . . , e„}, 
obtained as lower envelope of a set of n precise probabilities defined as follows: 

-Pi(ei) = 0; Pi(e2) = jiypri; ’ ’ ’ Pi(cn) = |^2|-i i 

^ 2 (ei) = ppp; P 2 (e 2 ) = 0 ; ••• P2(e„) = 1^31; 

( 6 ) 



-Pn(ei) — |i7|_ii Pn{e-2) — 



Pn{en) = 0 . 
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The lower probability assignment minimizing (2) is recovered for n = 3. We 
are therefore interested in verifying whether the value of m(f2) has a consistent 
characterization at the varying of n, i.e. of | 17 |. The following result gives a 
negative answer: the sign of m{Q) alternates at the varying of | 17 |. 



Proposition 2 Let he the lower probability obtained as lower envelope of 
the probabilities defined by (6) for a given n, and m„ the corresponding mass 
function obtained by Mohius inversion, then mn{L2) is positive when n is even 
and negative when n is odd. 



First of all, let us note that, in the considered case, we have 'iA C 17, A yf 0, 
A 17, P_n{A) = 1 Pn{A) = (these values are easily obtained as the 

maximum and minimum of the precise probability values Pi{A) in the family 
defined by (6)). Recalling that m„( 17) = we easily obtain: 



m. 



fQ) = Y^{-iy 



2 = 1 



i — 1 
i J n — 1 



First, let us recall from combinatorial calculus that: 

n / \ ^ 

■ 1=0 



i=0 



i=0 



Equation (7) can also be written as: 
1 



to„(17) = 









n — 1 

As to the first summation in (9) we have: 

n\ . f n — \ 

Putting h = i — 1 and recalling (8) we obtain: 



-E(-i)' 



2=1 



( 7 ) 



(8) 



(9) 



(10) 



n / \ n 

^(-1)— r z = n^(-l)— 
2=1 ^ ^ 2 = 1 

As for the second summation in 



n — 1 
i — 1 



(9): 



n—1 

= n^(-l)”-('*+i) 

h—0 




E(-i)”-* (^) = (^E(-i)”-* (”) j - (-1)” (o) = -(-1) 

Exploiting (11) and (12) in (9), we obtain the desired result: 

(- 1 )” 

m„(17) = . 



= 0 

( 11 ) 



(12) 



(13) 
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Proposition 2 suggests a negative answer to the first question we posed. 
Starting from the lower probability corresponding to the minimal value of m(l7) 
for I 17 1= 3, we have identified a family of CLP where the sign of m(17) varies 
with the cardinality of 17 itself. Apart from cardinality, it does not seem that, 
in this family, the cases where to (17) < 0 feature any significant difference with 
respect to the ones with to (17) > 0, nor the fact that cardinality of 17 is even or 
odd appears to be particularly meaningful. Given the lack of significance in these 
cases, we also tend to exclude that the mass sign can be ascribed any intuitively 
appreciable general meaning in the context of CLP. 

Let us turn to the second question. We restrict our analysis to the case of 
I 17 1= 3 and analyze another generalization of the lower probability minimizing 
17 in this case. We consider a family of lower probabilities obtained as lower 
envelope of three parametric precise probability assignments: 

P\{ei) = k-, Pi(e2) = a(l - fc); Pi(e3) = (1 - a)(l - fc); 



^2(ei) = (1 - a)(l - fc); = k] P^ie^) = a{l - k)] (14) 

P 3 (ei) = a(l - fc); P^i{e 2 ) = (1 - a)(l - fc); ^3(63) = k] 

where k £ [0, 1]; a £ [0, 0.5] (note that this implies a(l — k) < (l — a)(l — k)). 
Given | 17 |= 3, (14) is a generalization of (6), which is recovered for k = 0, 
a = 0.5. 

The resulting parametric lower probability is as follows: 

P(ei) = min{k,a{l — k)) VzG {1,2,3}; (15) 

P(ci V Cj) = a(l — k) + min{k, (1 — a)(l — k)) Vz, j S (1, 2, 3}, i ^ j ■ (16) 

The corresponding mass function is: 

TO(ci)= P(ci) VzG {1,2,3}; (17) 

m{eiVej)= P(ej V e^) - 2P(e*) Vz, j G {1, 2, 3}, z yf j; (18) 

to(C) = 1 - 3P(ci V ej) + 3P(ci) . (19) 



Since the parametric family of CLP defined above includes both belief func- 
tions and non belief functions, we are interested in characterizing the subset of 
belief functions within it. 

It is easy to see that at the varying of a and k, m(ei V ej) > 0, therefore we 
have just to characterize the couples (a, k) for which m{Q) < 0. By applying 
(15) and (16) into (19) we obtain 

to( 17) = 1 — 3a(l — k) — 5{min{k, (1 — o;)(l — k))) + 2>{min{k, 0(1 — k))) . (20) 
Several cases have to be considered: 

1. for k < a{l — k), equation (20) yields: 

to( 17) = 1 — 3a(l — k) 
in this case to( 17) < 0 for a > 3^31;) ; 



(21) 
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2. for Qf(l — fc) < A: < (1 — a)(l — k), equation (20) yields: 

m(l7) = 1 - 3A: (22) 

in this case m(l7) < 0 for k > 1/3; 

3. for fc > (1 — a)(l — /c), equation (20) yields: 

m(l7) = 1 — 3(1 — a)(l — k) (23) 

in this case m(l7) < 0 for a < | • 

The equations above identify three regions of the space of the couples (a, k), 
each including both couples corresponding to belief functions and to non belief 
functions: this is graphically presented in Figure 1. As Figure 1 shows, belief 
functions do not represent a distinct region in the considered space: there are 
two different areas corresponding to belief functions. These areas intersect in a 
single point, corresponding to the only precise probability included in this family. 




Fig. 1. Belief and non belief functions in the (a, k) space. 



Figure 2 provides another view of the same result: the bold lines represent 
the values of P{ei) and P{ei) respectively, at varying of k for the case a = 0.4 
(a similar figure would be obtained for any 1/3 < a < 1/2, while other values of 
a give rise to simpler cases). Note that the values P{ei) and P{ei) completely 
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define an imprecise probability assignment in the considered case. Vertical lines 
separate the regions with different sign of m(l7), i.e. imprecise probabilities which 
are belief functions from those which are not. The position of the vertical lines 
derives directly from (21) - (23), the corresponding geometric constructions are 
not shown, since they would make the figure unreadable. Again, it does not seem 
that belief functions, which span on two separate regions, can be characterized 
as an intuitively meaningful subset. In particular, the borderlines between belief 
and non belief functions do not correspond to meaningful variations in the un- 
derlying upper-lower probability intervals. To put it in other words, it seems that 
belief functions can not be given a robust intuitive characterization as a class of 
imprecise probabilities, since infinitesimal (and not otherwise significant) varia- 
tions of upper and lower probability values may make the difference. 




Fig. 2. P(ei), P{ei), and sign of m{0) for a = 0.4 



This result suggests a negative answer also to the second question we posed: 
the regions corresponding to belief functions do not seem to identify an intu- 
itively sensible subclass within the set of coherent imprecise probabilities con- 
sidered. While belief functions feature distinct mathematical properties, such as 
the mass sign, it does not seem that it is possible to point out any meaningful 
distinction between those imprecise probabilities which are belief functions and 
those which, possibly by an infinitesimal difference, are not. As pointed out by 
one of the referees, it has also to be noted that we have selected a specific cut, 
where the regions are not connected, while it may be the case that, in a higher 
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dimensional space, they form a connected region, which may look nicely after 
some transformations. While we welcome this objection, we remark that the cut 
we used is a simple and ’’natural” one: alternative cuts and transformations in 
more complex spaces would in turn require an articulated (and possibly difficult 
to find) intuitive justification. 

6 Conclusions 

Though based on specific examples, the results of previous section provide rea- 
sonably justified answers to the questions we posed about the relationships be- 
tween the interpretation of belief functions based on the notion of non-negative 
masses and coherent lower probabilities. 

These answers can in turn be interpreted in two alternative ways: 

— from a probability-oriented point of view, they tend to exclude the possibility 
of giving a meaningful interpretation of belief functions as an autonomous 
specialized theory in the framework of imprecise probabilities. In this sense, 
the results of this paper would accrue with other similar past criticisms. 

— from a TBM-oriented point of view, they tend to confirm that the search for 
a common conceptual basis with any form of probability theory is doomed 
to fail, since the two theories have definitely different roots. 

Grabisch has shown that the Mobius transform for a fuzzy measure yields the 
coefficients of its multilinear polynomial extension (see [4] for details) . While this 
result supports an alternative mathematical interpretation of the mass values in 
a more abstract framework, which encompasses both uncertainty representation 
and multicriteria decision making, it does not seem to provide further hints for 
an intuitive interpretation of masses in the context of belief functions theory for 
the representation of uncertainty. 

We propose a further perspective: the difficulties related to the notion of mass 
suggest that the search of alternative intuitive interpretations of belief functions 
should still be pursued. We believe that the more general theory of coherent 
imprecise probabilities may provide hints for this research goal. In particular, 
there is some evidence in the literature that the n-monotonicity property is 
somehow related to concepts of independence. We mention for this an example in 
5.13.4 of [21], where lack of independence between two tosses of a coin naturally 
leads to a lower probability evaluation, which cannot be n- monotone. Further, 
it is shown in 4.1 of [19] that a general notion of epistemic independence is 
consistent with a concept of n-monotonicity, termed external n-monotonicity. 

Acknowledgments. We thank the referees for their helpful comments. 
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Abstract. In this paper, we study different concepts of conditional 
belief functions independence in the context of the transferable belief 
model. We especially clarify the relationships between the concepts of 
conditional non-interactivity, irrelevance and doxastic independence. Con- 
ditional non-interactivity is defined by the ’mathematical’ property use- 
ful for computation considerations and corresponds to decomposionality 
of the belief functions. Conditional irrelevance is defined by a ’common 
sense’ property based on conditioning. Conditional doxastic indepen- 
dence is defined by a particular form of irrelevance, the one preserved 
under Dempster’s rule of combination. 

Keywords: Conditional Independence, Conditional Irrelevance, Condi- 
tional Non-interactivity, Belief functions. Transferable Belief Model. 



1 Introduction 

The essence of conditional independence (Cl) can be identified with a common 
structure consisting of some basic properties of the Cl relation, called ' graphoid 
axioms^ [5]. These axioms convey the simple idea that when we learn an irrelevant 
fact, the relevance relationships of all other facts remain unchanged [13]. 

The graphoid axioms have been initially developed by Dawid [5] for proba- 
bilistic conditional independence. In order to enhance the application of Prob- 
ability Theory to Artificial Intelligence, Pearl and Paz [14] established the con- 
nections between probabilistic conditional independence and graphical represen- 
tation. The graphoid axioms are also satisfied by other non-probabilistic models 
such as embedded multi-valued dependency models in relational databases [9], 
conditional independence in Spohn’s theory of ordinal conditional functions [10], 
qualitative conditional independence in Dempster-Shafer theory of belief func- 
tions partitions [16], possibilistic conditional independence [8], [22], conditional 
independence and irrelevance in connection to the theory of closed convex sets 
of probability measures [4], and conditional independence in valuation-based 
systems representing many different uncertainty calculi [17]. 

S. Benferhat and P. Besnard (Eds.): ECSQARU 2001, LNAI 2143, pp. 340-349, 2001. 
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Unfortunately these axioms have not received a complete treatment when 
related to the theory of belief functions. For this purpose, we study the no- 
tion of conditional independence between sets of variables when uncertainty is 
expressed by belief functions as defined in the context of the transferable belief 
model (TBM) [20], [19]. We present the concepts of conditional non-interactivity, 
irrelevance and doxastic independence. We enhance the distinction between con- 
ditional irrelevance and conditional independence. Then, we show that, like in the 
marginal case [2], conditional irrelevance alone does not imply conditional non- 
interactivity. We also prove that conditional doxastic independence is equivalent 
to conditional non-interactivity. In this paper, all theorems are stated without 
proofs. The reader is referred to [3] for the proofs of these theorems. 

The remainder of this paper is organized as follows. In section 2, we first intro- 
duce the necessary notations and terminologies. Then, after extending the defi- 
nition of evidential and cognitive independence to the conditional case (section 
3), we present our definitions of conditional non-interactivity (section 4), condi- 
tional irrelevance (section 5) and conditional doxastic independence (section 6) 
for belief functions. Finally, in section 7, we summarize the results achieved in 
this paper and point out some future directions. 

2 Notations and Terminologies 

The main purpose of the theory of belief functions, also known as Dempster- 
Shafer theory and theory of evidence, is to model someone’s degree of belief. 
Since its introduction by Shafer [15], many interpretations have been proposed. 
Among them, we can distinguish: the lower probability model [23], the hint 
model [12] and the transferable belief model (TBM) [20]. In this paper, we are 
only concerned with the TBM. 

Most needed definitions and properties have been previously given in our 
paper concerning the marginal belief function independence [2]. In this section, 
we just reproduce the important ones in order to help the reader. 

2.1 Sets 

When authors discuss about conditional independence, they begin with a set 
S of variables Ai,A 2 ...A„, then consider pairwise disjoint subsets of variables 
U, V, W where U C S, V C S and IF C S'. The concepts of non-interactivity, 
irrelevance and independence are then defined between U, V and W. We will 
not repeat systematically these preliminary definitions, and we will consider only 
three variables, denoted X, Y, Z, with the understanding that each one represents 
a variable which domain is the product space of its related Xi variables. We give 
here some essential set notations. 

— By convention, indexed variables like Xi,yj,Zk denote elements of their do- 
main whereas x,y, z denote subsets of their domain. 

— If X, Y, Z are three variables, XY denotes X >Y and XY Z denotes X > 
Y> Z. 
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— For X C X,y C Y,{x,y) denotes the subset w of XY such that w = {{xi,yj) : 
Xi e x,yj G y}. 

— For X C X,y C Y, z C Z,{x,y,z) denotes the subset w of XYZ such that 
w = {(xi,yj,Zk) : x^ G x,yj Gy,Zk € z}. 

— For X C X, is the cylindrical extension of x on XY : x^^^ = {x, Y) 

— For w C , is the projection of w on = {xi : Xi G X, x\ Dw 

0 }- 

— We assume that the variables X, Y, Z are ‘logically independent’, by what 
we mean that: 



(Xj, Y, Z) n (X, yj,Z) n (X, Y, Zk) yf 0, Vx^ G X, yj GY,Zk G Z. 



2.2 Belief Functions 

— We use the notation m [x] to represent the bba (shorthand for basic belief 
assignment) m defined on the domain given the belief holder knows that 
X is true (i.e. x holds). The symbol m can be replaced by bel,pl,q in order 
to denote the belief function, the plausibility function and the commonal- 
ity function. The values taken by these functions at ru C are denoted by 
m [x](w), bel [x](w), pi [x](w), q [x](rc), respectively. 

— In the TBM, none of these functions is necessarily normalized. When we want 
to get the normalized forms, we use the upper-cases notations M, Bel, PI, Q. 
These normalized functions are obtained by dividing the unnormalized func- 
tions by the factor 1 - m(0) (putting M(0) = 0, Bel{%) = 0 and Q(0) = 1). 

— Let be defined on the frame X. Then is the bba defined on the 

frame X >Y with: 



{w) = m(x), if w = {x,Y) 

= 0, otherwise. 

mXfXY jg called a vacuous extension of m^. 

— Let mXX be defined on the frame X >Y. Then mX^l^ is the bba defined 
on the frame X with: 

^ mX^(w),VxCX 

—X 



mX^fx ig called the marginal of njXX on X. 

— The > symbol represents Dempster’s rule of combination in its normalized 
form and s represents the conjunctive combination, i.e., the same operation 
as Dempster’s rule of combination except the normalization (the division by 
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1 — is not performed. The conjunctive combination rule can be written 

equivalently as: 



rrii^ 2{w) = mi<a m2{w) = ^ mi{wi)m2{w2) 

Wi,W 2Q ,Wir\W2—'W 

The next formula is very useful (see Smets [18]): 

fi^ 2 (w)= ^ fi[w-]{w)m 2 {w-), \/w C (1) 

w* C 



where / G {m, bel,pl, q} and fl[w^ is the result of the unnormalized condi- 
tioning of /i on w- C 

— Both > and s operations are extended, so that they can be applied to two 

bba’s nil and m 2 not defined on the same frame (the frames being never- 
theless compatibles), so is short for 

— ph'eph represents the plausibility function obtained from mi<£ m 2 where mi 
and m 2 are the bba’s related to pl\ and pl 2 , respectively (and similarly with 
bel and q). 

— The set of belief functions defined on is denoted by BF . 

— By abuse of language, we may omit the index and we will write statements 
like m G BF to mean that the belief function associated with m belongs 
to BF . 

— When convenient, bba’s m on are represented by the list of pairs (w,x) 
where w is a focal element of m (a subset of with a non null bbm), 
and X = m{w). So ((wi,.4), (■u; 2 ) -6)) represents the bba m on with 
m (wi) = .4 and m (^ 2 ) = -6, and Wi C and W 2 C 

3 Evidential and Cognitive Independence 

In the marginal case [2], we have presented the notions of evidential independence 
and cognitive independence for belief functions. These notions have been first 
introduced by Shafer [15] for the marginal case. In addition, it is shown in Shafer 
[15] that evidential independence implies cognitive independence, but not the 
reverse. In this section, we only consider evidential independence. 

In the multivariate framework, Kong [11] studied the conditional case. He 
defined the notion of evidential conditional independence of belief functions, as 
follows (remember variables X, Y, and Z are always pairwise disjoint subsets of 
variables (see section 2.1): 
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Definition 1. Evidential Conditional Independence. Let X,Y and Z be 

three variables. X and Y are (evidentially) conditionally independent given Z 
with respect to if and only if for all x C X, y C Y, zi G Z: 

Bel^^^[z.]^^^(x,y) = Bel^^^[z,]^^(x)Bel^^^[z.]^^(y) (2) 

When Z is not specified this becomes marginal evidential independence of X 
and Y (Definition 3 in [2]). 

Almond ([1], page 114) calls this independence a strong conditional indepen- 
dence and shows it is equivalent to: 

Definition 2. Strong Conditional Independence. Let X, Y and Z be three 
variables. X and Y are (strongly) conditionally independent given Z with respect 
to Bel^^^ if and only if 

Bel^^^ = Bel^^^^^^ > (3) 

Note that these definitions are based on normalized belief functions. When 
we tolerate unnormalized belief functions, the term Bel^^^^^ must be added 
and the definition becomes as follows: 

Definition 3. Strong Conditional Independence. Let X, Y and Z be three 
variables. X and Y are (strongly) conditionally independent given Z with respect 
to Bel^^^ if and only if 

Bel^^^ > Bel^^^^^ = > Bel^^^^^^ (4) 

This definition turns out to be equivalent to what we call hereafter con- 
ditional non-interactivity. In the following sections, we present our definition 
of conditional non-interactivity, conditional irrelevance and conditional doxastic 
independence. 

4 Conditional Non-Interactivity 

We focus now on the decompositional independence definition for belief func- 
tions. This definition is represented by the non-interactivity that is a mathemat- 
ical property useful for calculus considerations. 

For the definition of the conditional non-interactivity for belief functions, we 
start from the belief on the joint product space XYZ. We marginalize it on AZ 
and also on YZ. We combine these two marginal belief functions and we want 
it to be equal to the initial one (on XY Z) combined with its marginal on Z. 

This last term results from the fact that the marginals on AZ and on FZ 
both contain the marginal on Z and this last is thus double counted when 
combining the marginals on AZ and on YZ. This term corresponds to the 
pl^^ (X,Y) term encountered when defining marginal non-interactivity (see re- 
lation (6) in [2]). The formal definition is given as follows: 
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Definition 4. Conditional Non-interactivity. Given three variables X, Y 
and Z , and G BFxyz, X and Y are conditionally non-interactive given 

Z with respect to , denoted by X _L„xvz Y\Z, if and only if 

(5) 

This definition of conditional non-interactivity (5) corresponds to Shenoy’ 
factorization (see [17], lemma 3.1 (5)). It can also be reformulated in terms of 
commonality functions as shown by Studeny [21]. 

Theorem 1. AT ±m^yz Y\Z iff for all w C XYZ 

q^^^iw) q^^ ^ ( 6 ) 

It is interesting to note that Studeny [21] has an objection about the def- 
inition of conditional non-interactivity^ in the framework of Dempster-Shafer 
theory. Indeed, he notices that the definition based on equation (5) is not con- 
sistent with marginalization. It may happen that for two bba’s mi G BFxz and 
m 2 G BFyz that share the same marginal on Z (i.e., = m^^) there ex- 
ists no bba on XYZ such that = mi, = m 2 and 

X -L^xvz Y\Z. 

Nevertheless the next theorem shows that, for any and , X and 

Y are non-interactive given Z under . The only subtlety is 

that and mX^ are not the marginals of on XZ and Y Z, respectively. 

This property provides in fact a convenient way to build belief functions that 
satisfy non-interactivity. Just take any pair of bba’s and mX^ and combine 
them conjunctively, the result is a bba under which X and Y are conditionally 
non-interactive given Z . 

Theorem 2. Let and mX^ be two bba’s on XZ and YZ, respectively. Let 
m = nrX^ then X Y\Z . 

5 Conditional Irrelevance 

Before presenting the definition of conditional irrelevance for belief functions, we 
explain the idea of two belief functions on XY Z that share the same marginals 
on Z after having been conjunctively combined with a given bba m on XY Z . 

The underlying idea is a problem of belief state distinguishability. Suppose 
two agents who hold beliefs on XY Z. Suppose You can only observe the beliefs 
held by these two agents on Z (thus the marginal on Z of their bba’s). If these 
two marginal bba’s are equal. You cannot distinguish between the beliefs held 
by the two agents, even though their beliefs on XYZ may be different. One 

^ Studeny [21] uses the term ’conditional independence’ rather than ’conditional non- 
interactivity’ 
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way to distinguish them is to present to the two agents a new piece of evidence 
which induces the bba m on XY Z . This last m is then combined conjunctively 
with the initial bba’s. The marginalization on Z can still be equal, or not, this 
depending on m. So one way to distinguish between belief states which can only 
be observed on Z is by producing various to, and comparing the marginals on Z . 

For a given to on XY Z , we can consider all the belief functions on XY Z 
which are indistinguishable on Z. These bba’s describe belief states that cannot 
be distinguished after having been conjunctively combined with to by only ob- 
serving their marginals on Z . Thus to creates an equivalence class on the set of 
belief functions defined on XY Z. 

5.1 Indistinguishability on Z under m 

Let R^{m) denote the set of belief functions on XY Z that are indistinguishable 
on Z under to. Its formal definition is as follows: 

Definition 5. Indistinguishability on Z under m. For any bba to, TOi,TO2 
G BFxyz, {‘mi, m2) G R^{m) iff {m'smi)^^ = (toSto2)^'^. 

In particular, we will use this concept of indistinguishability when to G 
BFxyz and toi,TO2 G BFyz what is just a particular case of the definition. 
The reason will be that we will define conditional irrelevance as the fact that 
the belief on XZ is influenced by the belief onYZ only through the impact of 
this last belief on Z, and not on the details on how it is distributed on Y Z. 

5.2 Definition of Conditional Irrelevance 

Let TO G BFxyz- Suppose that we study the impact of any bba TOj G BFyz on 
our belief on XZ, i.e., we study {m’e . Suppose the impact of to^ on to is 

fully captured by its impact on Z . By that we mean that the impact of mi defined 
on y Z and the impact of any other mj defined oxiYZ with {mi,mj) G R^{m) 
are equal when it comes to the belief induced on XZ. Equivalently it means that 
all that counts for what regards our beliefs on XZ after we combine to with mi 
is the belief induced by to§ mi on Z. Further details on the beliefs on YZ are 
irrelevant. 

In that case, we say that Y is conditionally irrelevant to X given Z with 
respect to to. Formally, we have the following definition: 

Definition 6. Conditional irrelevance. Letm G BFxyz- F is conditionally 
irrelevant to X given Z with respect to m, denoted by IRm{X,Y\Z), if and only 
if for all mi, m2 G BFyz 'with {mi, m2) G R^ {m) we have 

(toS TOi)^^'^ = (toG TO2)^^^ (7) 

Notice that conditional irrelevance alone does not imply conditional non- 
interactivity between variables. 
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6 Conditional Doxastic Independence 

In the probabilistic framework, it can be easily proved that independence and 
irrelevance concepts are equivalent. However, in the belief functions framework, 
the situation is not as simple, irrelevance alone does not imply independence. 

In the marginal case [2], we have defined that two variables are doxastically 
independent when they are irrelevant and this irrelevance is preserved under 
Dempster’s rule of combination. Then we have proved that non-interactivity 
and doxastic independence are equivalent. 

In this section, we show that the notion of doxastic independence^ defined 
in the marginal case can be extended to the conditional case. We also state the 
theorem establishing the equivalence between conditional doxastic independence 
and conditional non-interactivity. 



6.1 Irrelevance Preservation under Conjunctive Combination 

Just as in the marginal case, we feel that conditional doxastic independence re- 
quires not only the conditional irrelevance property, but that property should 
be preserved when combining two belief functions that satisfy it. The idea fits 
with the next scenario: if two agents claims that X and Y are conditionally 
doxastically independent given Z , then this conditional independence should be 
preserved when the belief functions representing the agents’ beliefs are conjunc- 
tively combined. 

So conditional doxastic independence is irrelevance plus irrelevance preserva- 
tion under conjunctive combination, denoted /i?P§ . Formally, the last property 
is defined as follows: 

Definition 7. Irrelevance preservation under conjunctive combination. 

Given mi, m2 G BFxyz, we say that mi and m2 satisfy IRP<£ if IRmi{X,Y\Z) 
and IRm2{X,Y\Z) imply (Jf, F|Z). 



6.2 Definition of Conditional Doxastic Independence 

The notion of doxastic independence defined in the marginal case can be ex- 
tended to the conditional case by the following definition. 

Definition 8. Conditional Doxastic Independence. Given three variables 
X, Y and Z, and G BFxyz- The variables X and Y are doxastically independent 
given Z with respect to m, denoted by X _U_m Y\Z, if and only if m satisfies 

- IRm{X,Y\Z) 

- Vmo G BFxyz : IR„,„{X,Y\Z) ^ IR^<amo(X,Y\Z) 

^ We use here the term 'doxastic independence’ for making the distinction between 
probabilistic independence and belief function independence. In Greek, ’doxein’ 
means ’to believe’. 
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6.3 Equivalence between _L and _LL 

We state the following theorem proving the equivalence between conditional dox- 
astic independence and conditional non-interactivity. The proof of this theorem 
is in [3]. 

Theorem 3. X _L^ Y\Z iff X JL^ Y\Z. 

7 Conclusion 

We have studied the concept of conditional belief function independence in the 
context of the transferable belief model, pointing out different ways to tackle the 
problem leading to the following definitions: 

— Conditional non-interactivity: the joint belief function can be rebuilt from 
its marginals. 

— Conditional irrelevance: the belief on XZ depends on any belief over YZ 
only through the impact of the last belief function on Z . 

— Conditional irrelevance preservation under conjunctive combination rule: if 
two belief functions satisfy conditional irrelevance, then their conjunctive 
combination satisfies also conditional irrelevance. 

~ Conditional doxastic independence: defined as conditional irrelevance that is 
preserved under conjunctive combination rule. 

The major result of this study is that conditional non-interactivity and con- 
ditional doxastic independence are equivalent. 

However, there remain a lot of future work to be done in this field such 
that the study of the properties of conditional products [7] for belief function 
theory, the links between our concept of conditional doxastic independence and 
the concept of separoid recently introduced by Dawid [6], and finally, the impact 
of conditional doxastic independence with respect to its graphical representation. 
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Abstract. We develop a method to evaluate the reliability of a sen- 
sor in a classification task when the uncertainty is represented by belief 
functions as understood in the transferable belief model. 

This reliability is represented by a discounting factor that minimizes 
the distance between the pignistic probabilities computed from the dis- 
counted beliefs and the actual values of the data in a learning set. 

We then describe a method to tune the discounting factors of several 
sensors when their reports are merged in order to reach an aggregated 
report. They are computed so that together they minimize the distance 
between the pignistic probabilities computed from the combined dis- 
counted belief functions and the actual values of the data in a learning 
set. 

The first method produces the reliability of a sensor considered alone. 
The second method considers a set of sensors, and weights each of them 
so that together they produce the best predictor. 



1 Introduction 

The belief function theory, in particular the Transferable Belief Model (TBM), 
is more and more used to represent and deal with uncertainty. It can be seen 
as a generalization of subjective probability theory. The TBM allows to handle 
data collected from partially reliable sensors. It can represent full, partial and 
even total ignorance. The conjunctive rule of combination provides the tool to 
aggregate the reports produced by several sensors in order to get their merged 
report. It seems to be perfectly adapted for multisensor data fusion [14]. 

Sensors use different approaches and types of measurements and work in 
different environments, and their reliability can vary from one to the other. One 
way to take in consideration the reliability and applicability of a sensor consists 
in weighing / discounting their reports. 

In the TBM, a sensor reports about the actual value of a variable is rep- 
resented by a belief function. The reliability of the sensor is represented by a 
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discounting factor, i.e., a coefficient that ‘weights’ the belief function produced 
by the sensor. Reliability and discounting are linked by the idea that if a sensor 
is felt as unreliable by the user, he/she will discount what the sensor states. 
Discount can be understood as ‘partially disregard’. The smaller the reliability, 
the larger the discounting. The discounting factor is a well defined concept in 
belief function theory (see section 2.3), whereas reliability will be used unfor- 
mally hereafter. 

This paper addresses the problem of assessing the discounting factor to be 
applied to the beliefs generated by the sensors. We develop two methods appli- 
cable in two contexts. The first consists in assessing the discounting factor to be 
applied to one sensor by comparing its report (represented by a belief function) 
with the actual values. The second consists in assessing the values of the dis- 
counting factor to be given to each of several sensors when their reports must 
be merged. It is obtained by comparing the merged discounted belief function 
with the actual values. 

The first method concerns one sensor, the second concerns a group of sensors 
who jointly must produce an aggregated decision. 

It may seem odd that we speak of ‘beliefs held by a sensor’, but the term 
belief is to be taken in a neutral way. No philosophical or psychological conno- 
tation is to be introduced. It is just a tradition that the functions that represent 
the sensor report are called ‘belief function’, hence the ‘belief’ term. Classically, 
sensors produce likelihoods. Here we just replace the term likelihood by beliefs, 
what enhances that we use belief functions and not probability functions. 

This paper is composed as follows. We start by giving an overview of the 
basics of the TBM. Next, we present the multisensor data fusion within the be- 
lief function formalism. We then describe the two methods for assessing sensor 
reliabilities and for tuning them. Each method is illustrated by an example ex- 
plaining its unfolding. 

In this paper, we speak of sensors, but all we present here can be applied di- 
rectly to other problems, like expert opinion pooling. An expert is just a sensor, 
and his/her opinion is equivalent to a sensor report. Data fusion and opinion 
pooling are analogous. 

Experts differ in level of expertise, some of them are more reliable than 
others due to their better knowledge, training, experience, intelligence ... To 
express their opinions, experts may use different background, methodology and 
even knowledge. Hence, the necessity to consider the expert’s reliability when 
receiving their opinions, and consequently these judgments must be appropri- 
ately ‘discounted’. 

Thus, the concepts of expert, opinion and expert opinion pooling are equiv- 
alent to those of sensor, report and data fusion. The methods presented in this 
paper can be applied directly to this other domain. Note that other researchers 
have proposed to assess experts’ discounting factors within the belief function 
theory, we basically mention the one developed by Zouhal and Denoeux [15]. 
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2 Belief Function Theory 

In this section, we briefly review the basics of the belief function theory as inter- 
preted in the Transferable Belief Model (TBM). For a more detailed explanation 
and other basics see [6,10,11]. 

2.1 Definitions 

Let 0 be a finite set of elementary and mutually exclusive hypotheses related to 
a given problem domain. It is called the frame of discernment. One value of 0, 
denoted 9q, corresponds to the actual value of 0. This actual value is not known 
by the belief holder (the sensor) . 

A basic belief assignment (bba) is a function m from the power set of 0, 
denoted 2®, to [0, 1] verifying: 



^ 

AC0 

The basic belief mass (bbm) m{A) given to A C 0 is the amount of belief 
specifically assigned to the event 9q G A and that cannot support any subset of 
0 more specific than A. 

The belief function (bel) represents the belief assigned to an event A C 0. 
It is equal to the sum of bbm committed to the subsets of A. For each bba m, 
there corresponds a belief function bel such that: bel : 2® — >■ [0, 1], and defined 
by: 



bel{A)= m(B), VAC0. (2) 

H^BCA 

A vacuous belief function is such that m{0) = 1 and m(A) = 0, VA C 
0, A 0. It represents a state of the total ignorance. 

2.2 Combination 

Consider two pieces of evidence on the same frame 0 represented by the two bbas 
mi and m 2 , the joint bba quantifying the combined impact of these two pieces 
of evidence is obtained through the conjunctive combination rule as follows [8]: 

(mi©m2)(A) = Y, mi{B).m2{C) (3) 

B,C(10-.BnC=A 

where © denotes the operator of conjunction. The classical Dempster’s rule of 
combination is the conjunctive combination rule where the result is normalized 
by dividing each term by (1 — (mi©m2)(0)). It is defined as: 

Y mi{B).m- 2 {C) 



(mi © m2)(A) = K. 



(4) 
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where 



K~^ = l- mi{B).m2{C) (5) 

B,CC0-.BnC=H) 



and 



(toi © m2)(0) = 0. (6) 

K is called the normalization factor. 

The conjunctive combination rule and Dempster’s rule of combination are 
commutative and associative, so we can combine several belief functions itera- 
tively and in any order. 



2.3 Discounting 

Reliability, i.e. our opinion about the ‘value’ of a sensor, varies from sensor to 
sensor. The idea is to weight more heavily the reports produced by the ‘best’ 
sensors and conversely for the ‘bad’ ones. For a G [0, 1], let (1 — a) be the degree 
of ‘confidence’ we assign to the sensor. It can be encoded into a bba defined on 
the set {reliable, not reliable} such that [9]: 

m{reliable) = 1 — a and m{not reliable) = a (7) 

Suppose the bba m on 0 represents the sensor report about the actual value 
of 0. The result of combining the sensor report with the bba given in (7) is a 
new bba, denoted m“, defined as: 

m°‘{A) = (1 — a).m{A) for A C 0 (8) 

m“(0) = a + (1 — a).m(0) (9) 

This operation is called a discounting by Shafer [6] and the coefficient a is 
called the discounting factor. The larger a, the closer is from the vacuous 
belief function. 



2.4 Pignistic Transformation 

To make decisions in the TBM, we build a probability function BetP on 0, called 
the pignistic probability function, by applying the pignistic transformation [10]. 
It is defined by: 



BetP{A) = ^ 

BC0 



\Ar\B\ m{B) 

\B\ 1 — m(0) ’ 



C 0 



( 10 ) 




354 



Z. Elouedi, K. Mellouli, and P. Smets 



3 Multisensor Data Fusion with the TBM 

Multisensor systems can be used for the detection, localization and recognition 
of objects in a given area [1] . Handling information collected by different sensors 
requires an evidence gathering process, called a multisensor data fusion process, 
in order to get, hopefully, a ‘better’ information. The TBM offers a formal way 
to combine sensor data what is achieved by the conjunctive combination rule. 

As mentioned before, sensors do not usually have the same level of reliability, 
so before pooling sensor reports (hence combining their belief functions), each 
belief function should be discounted to take into account the sensor reliability 
represented by the discounting factor. When these discounting factors are not 
known, they must be assessed. We propose two methods for such an assessment 
which correspond to the two different contexts mentioned in the introduction. 



4 Evaluating Sensor’s Reliability 

4.1 Introduction 

Finding an ’automatic’ method to assess the sensor’s reliability relative to a 
given problem requires information regarding the judgments given previously by 
the sensor concerning ’past’ events (related to the same problem) for which the 
truth is known by us and not by the sensor. Then, a comparison between the 
truth and the sensor’s judgments allows to derive the reliability of the sensor. 

In practice, one domain where we can get this kind of information is rep- 
resented by classification problems^. In such problems, we can get the sensor’s 
reports on the classes to which an object belongs, a class otherwise well known 
by us. In the following subsections, we focus on classification problems. The 
method can easily be adapted to other domains, the underlying schema being 
quite general. 

4.2 The Framework 

Let T be a set composed of n objects denoted by Oj {j = 1, 2, ..., n). Each object 
has to belong to one of the possible classes relative to the given problem. The 
set of classes is defined by O = {9i,92, For each object Oj, we know its 

class, denoted cj with Cj £ 0, and the sensor produces a bba, denoted mP{oj} 
on 0, that represents its opinion on the actual value of Cj. 

4.3 Assessing the Discounting Factor 

The first method considers one sensor for which we want to assess its reliability, 
thus its discounting factor. This is done by comparing the bba produced by the 
sensor about the class of each of the n objects with their actual classes. 

^ Several classification methods have been developed using belief function basics like 
the one proposed by Denoeux [2], and the one proposed by Elouedi and al. [3]. 
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If we knew the discounting factor a applicable to a sensor we would discount 
the bba it generates by the discounting factor. So we would compute the bba 
using relations (8) and (9). 

If we had to decide which class objects Oj belongs to, we would then com- 
pute the pignistic probability from Let the result be denoted by 

This probability function is then to be compared with the ac- 
tual value Cj of object Oj. Let the indicator function d be defined as 5j^i = 1 if 
Cj = 9i and 0 otherwise. 

The distance between the pignistic probability computed from the discounted 
sensor’s report and the indicator function S is used as a measure of the reliability 
of the sensor for what concerns object Oj, and their sum over the n objects is 
used as a measure of the overall reliability of the sensor. It is denoted Total Dist 
and defined as: 

n p 

TotalDist = ^ ^(i?etP®’“{oj}(0i) — 5j^iY 

j=i i=i 

We then define the reliability of the sensor as (1— a € [0,1]) where a minimizes 
TotalDist, i.e., the a that makes the discounted opinions of the sensor as good 
as possible, thus that makes the values of as close as possible to 

Sj,i- 



4.4 Explicit Computation with Normalized Belief Functions 

In the special but common case where all bbas are normalized (thus m(0) = 0), 
it is possible to explicitate the value of a from the initial bba m^{oj}. Let 
BetP^{oj} be the pignistic probability function computed from m^{oj}, hence 
before discounting. The solution for a is given in the next theorem. 

Theorem 1. Let a set of normalized bbas m^{oj} defined on the set of classes 
0 = {01, . . . , Op} for objects Oj, j = 1, . . . ,n. Let the indicator function Sj^i = 1 
if the object Oj belongs to the class 6i, and 0 otherwise. The discounting factor 
a that minimizes: 

n p 

TotalDist = ^ 

i=i i=l 



where BetP^’°"{oj} is the pignistic probability function computed from the dis- 
counted bba m^’°‘{oj}, is given by: 



a = min(l,max(0, 



E n 

i — 






ELife - BetP^{o,}{0,)).BetP^{o,}{0,) 

nIp-EU T,LiBetP^{o,}{0fi^ >> 



Proof. Given m^{oj}, we have: 



= (1 - a)m^{oj}{0) 

= (1 - a)m^{oj}(0) -h . 



if 6» C 0 

if 6» = 0 
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The pignistic probability BetP^’°'{oj} computed from can be ex- 

pressed as a function of the pignistic probability BetP^{oj} computed directly 
from m^{oj}. For simplicity sake, we omit the {oj} index hereafter. One has: 



BetP^{9i) 

BetP^'°‘{9i) 



E 



1^1 



E 



wP'°‘{9) 



E 



(1 — a)wP{9) 



-I- a/p = (1 



a)BetP^{9) + a/p 



For simplicity sake, we write Pij = BetP^{oj}{9i). The term to be minimized 
becomes: 



TotalDist = ^ '^{BetP^’°‘{oj}{9j) - Sp)'^ = ^((1 - ce)Pij + a/p - 
i = l i=l 






Its extremum is reached when its derivative is null, hence when: 
^ d TotalDist 



da 



= ^E((^ “ + (x/p- Sji){-P^j + 1/p) (11) 

oc ^ -(1 - a)p/j - an/p + ^ ^jiPij + (1 - ce)n/p + an/p - n/p (12) 

Id id 

= a)P/j- an/pP^dj,Pij (13) 



Thus, a = 






n 



Ip - Plj 



□ 



Once the discounting factors are computed for several sensors, the observed 
values can be used to order several sensors: the smaller the value the better the 
sensor. It could be used to select optimal sensors. It can also be used to discount 
the reports produced by the sensor in the future. 



4.5 The Simplified Equivalent 

Usually, probabilities are easier to understand than discounting factors. So one 
way to get a feeling of what represents the value a of a discounting factor, 
we consider a highly simplified case where there are just 2 objects that can be 
either a or b. Both are a’s. You are sure that object 1 is a, but have a probability 
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7T < 0.5 that object 2 is a and 1 — tt that it is b. So you are right for object 1 and 
quite wrong with object 2 (with probability 1 — tt). If tt was known, we could 
compute its related discounting factor by applying the previous relations. When 
TT is unknown but a is known, we can compute the value of tt that underlies a 
in our simplified schema. We have: a € [0, 1], compute 

3 — 2a — Vl + 4a — 4a^ 

^ ^ 4 -4a 

This is the value tt that would produce a in the simplified schema where we 
deal with only two objects and two classes and where the sensor is only uncertain 
about the class of one object. This tt represents the probability that the sensor 
is correct and that would induced a discounting factor equals to a. 



4.6 Example 1 

Suppose there are two sensors S'!, Si applied to classify aerial targets. The 
possible classes are: Q = {Airplane, Helicopter , Rocket}. In order to find the 
degree of reliability of these two sensors, table 1 presents their reports on the 
classes of 4 objects where their classes are known by us (a part of a learning 
set), but not by the sensors Si, Si. At the first row of the table, we have the 
actual class of each object, then we present the two sensors’ bbas on the classes 
of these objects (since they do not know the truth) . 



Table 1. The sensors’ bbas and the truth 



Truth 


Airplane Helicopter Airplane Rocket 


Si 


Ol 


02 


03 


04 


0 


0 


0 


0 


0 


Airplane 


0 


0 


0 


0 


Helicopter 


0 


0.5 


0.4 


0 


Rocket 


0.5 


0.2 


0 


0 


Airplane U Helicopter 


0 


0 


0 


0 


Airplane U Rocket 


0 


0 


0.6 


0.6 


Helicopter U Rocket 


0.3 


0 


0 


0.4 


Airplane U Helicopter U Rocket 


0.2 


0.3 


0 


0 


S2 


Ol 


02 


03 


04 


0 


0 


0 


0 


0 


Airplane 


0 


0.3 


0.2 


0 


Helicopter 


0 


0 


0 


0 


Rocket 


0 


0 


0 


0 


Airplane U Helicopter 


0.7 


0.4 


0 


0 


Airplane U Rocket 


0 


0 


0 


0 


Helicopter U Rocket 


0 


0 


0.6 


1 


Airplane U Helicopter U Rocket 


0.3 


0.3 


0.2 


0 
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Assume that the discounting factors assigned to the two sensors and S2 
are respectively ai and «2- 

Let’s focus on the first sensor, we have to update the bbas relative to the 
objects oi, 02, 03 and 04 by taking into account ai. We get: 

mgi’“^{oi}(i?ocfcet) = 0.5(1 — oi), {oi}{HelicopterU Rocket) = 0.3(1 — 
ai), {oi}{ 0 ) = 0.2 + 0.8ai 

{ 02 } {Helicopter) = 0.5(1 — ai), {o2}{Rocket) = 0.2(1 — ai), 

''^Si’^no2}(^?) = 0.3 + 0.7ai 

{ 03 } (Helicopter) = 0.4(1 — a\), {o3\{Airplane U Rocket) = 

0.6(1 - ai), TOf(“^{o3}(6») = «i 

{ 04 } (Airplane U Rocket) = 0.6(1 — ai), { 04 } (Helicopter U 

Rocket) = 0.4(1 — a\), mg^’“^{o4}(0) = ai 

The corresponding discounted BetP relative to the first sensor is summarized 
in this following table: 



Table 2. Si’s discounted BetPs 



Si 


Ol 


O 2 


03 


04 


Airplane 

Helicopter 

Rocket 


0.07- 0.27ai 

0.22 - 0.12ai 

0.72 -t 0.38ai 


0.10-0.23O1 
0.60-t0.27ai 
0.30 - 0.03«i 


0.30 - 0.03ai 
0.40 -t 0.07ai 
0.30 - 0.03«i 


0.30 - 0.03«i 
0.20-0.13O1 
0.50-t0.17ai 



For example the computation of BetPg^°‘^ {oi\ (Helicopter) is done as fol- 
lows: BetPf^°‘^ {oi}(Helicopter) = _|_ o. 2 +o. 8 ai _ q 22 _ 0.12ai 

Using the different values of BetPs, the whole distance relative to the sensor 
Si will be equal to: 



4 3 

TotalDist = 

j=i i=i 



Hence 

TotalDist = 0.41a^ — 0.56oi -I- 2.81; 

Minimizing TotalDist under the constraint 0 < ai < 1 gives as a result 
Q?! = 0.68. Hence, the discounting factor to be given to sensor S'! by taking into 
account its opinions on the classes of the objects oj, j = 1,2, 3, 4, is equal to 
0 . 68 . 

Applying the same procedure for the beliefs given by the second sensor ( 52 ), 
we get 02 = 0.52 as the discounting factor of this sensor. Thus sensor S 2 is (a 
little) better than sensor Si. 

Just to get an idea about what represents the two discounting factors (see 
section 4.5), their equivalent in the highly simplified schema of 2 objects produce 
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7T values of 0.21 and 0.28, respectively, what can be understood as ‘the sensors 
are really not good, and the second is just a little better then the first’. This is 
indeed what the data also show. 

5 Tuning Sensors’ Reports 

Evaluating sensors within this second framework is based on taking into account 
the sensors’ bbas together and not independently as we have done in the previous 
section. 

The idea is to build the best predictor from a set of available sensors. Bad ones 
should be discounted more than good ones. The present method is applicable 
when the main objective is to get the best aggregated report induced from those 
given by the sensors. 

This requires assessing the ‘best’ values of the discounting factors to be allo- 
cated to each sensor knowing that their discounted ‘beliefs’ will be merged. 

The ‘best’ discounting factors are those that will make the pignistic proba- 
bilities induced by the conjunctive combination of the discounted bba’s as close 
as possible from the actual values, just as done in the previous section. Such 
process is named tuning sensors’ reports. 

In order to derive the optimal set of discounting factors, we apply the follow- 
ing steps. Suppose we knew the discounting factors, we would then: 

— For each bba rn^^{oj}, discount it by its discounting factor ak given to the 

sensor Sk- We get This process will be applied for the bba given 

by the sensor for each object. 

— For each object Oj {j = combine the different discounted bba’s by 

applying the conjunctive rule. We get: 

m^{oj} = (14) 

m^{oj} is a joint bba representing the induced belief on the class to which 
object Oj belongs computed by taking into account the data collected from 
all the sensors. 

— Compute the corresponding BetP^ {oj} (relative to the bba m^{oj}) rep- 
resenting the pignistic probability on the class of object oj. 

— For each object Oj, compute the distance between BetP^ {oj} and the real 
class of Oj. This distance is defined by: 

p 

Dist{o,} = J2(BetP^^^{o,m) - 

i=l 

where 6jj = 1 if Cj = 6i and 0 otherwise. 

— Compute Total Dist as follows: 

n 

Total Dist = Dist{oj} 



(15) 
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This variable depends on the discounting factors ai, a 2 , • ■ • , ak- 

— In order to find the optimal discounting factors, we have to minimize 

TotalDist on the a’s under the constraints 0 < < 1, \/v € {1, k} 

Example 2. Let’s use the same data in the example 1 (see table 1). Let’s 
apply our second method on the two sensors’ reports by assuming that we want 
to get the merged report. 

Once S'l’bbas and S' 2 ’bbas are discounted, we get respectively mg’°‘^{oj} 
and where j = 1, 2, 3, 4, which are linear functions of the discounting 

factors. For each object Oj,we compute the joint bba m^{oj}: 

where the a terms are at most of the form j Oi where I is the number 

of sensors (J = 2 in the present case). 

The corresponding discounted BetPs relative to the these bbas are also linear 
functions of the same product terms. The value of Dist{oj} relative to the 
objects, as wells as TotalDist are quadratic functions of the previous product 
terms. 

So its minimization on the at is very simple and can be achieved by any mini- 
mization program. Even when we work more than two sensors, any minimization 
program can give the different values of a. 

In the present case, a± = 0.28 and «2 = 0.12. It should be enhanced that 
the a coefficients computed in this second method should not be assimilated to 
those computed with the first one. Here we want the a so that the multisensor 
is ‘optimal’, whereas in the first method, we compute a in order to evaluate the 
individual sensor quality. 



6 Conclusion 

In the TBM, degrees of reliability to give to sensors are represented by discount- 
ing factors. In this paper, we have presented one method for assessing these 
discounting factors in a classification context where we to have at our disposal a 
learning set where the classes of the object are perfectly known and when each 
sensor is considered alone. 

We have also present a tuning method by which each sensor in a group of 
sensors is partially discounted so that the overall set of sensors is optimal. 

These methods are presented by studying a classification problem. They can 
easily be extended to other problems of prediction. All that is required is a 
learning set and a distance between the prediction and the actual values. 

We have presented operational methods to assess the discounting factors in 
two contexts. It will be useful for any problem of multisensor data fusion [14]. 




The Evaluation of Sensors’ Reliability and Their Tuning 361 



References 

1. Ayoun, A., Smets,P.: Data association in multi-target detection using the trans- 
ferable model. Intern. J. Intell. Systems, (to appear) (2001) 

2. Denoeux, T.: A k-nearest neighbor classification rule based on Dempster-Shafer 
theory. IEEE Transactions on Systems, Man, and Cybernetics, Vol 25, N5, May 
(1995) 804-813 

3. Elouedi, Z., Mellouli, K., Smets, P.: Classification with belief decision trees, the pro- 
ceedings of the Ninth International Conference on Artificial Intelligence: Method- 
ology, Systems, Applications, AIMSA’2000 (2000) 80-90 

4. Guan, J. W., Bell, D. A.: Discounting and Combination Operations in eviden- 
tial Reasoning. Proceedings of the Ninth Conference on Uncertainty in Artificial 
Intelligence, (1993) 80-90. 

5. Ling, X., Rudd, W. G.: Combining Opinions From Several Experts. Applied Arti- 
ficial Intelligence 3 (1989), 439-452, 

6. Shafer, G.: A mathematical theory of evidence. Princeton University Press (1976) 

7. Shafer, G.: The Combination of Evidence. International Journal of Intelligent Sys- 
tems 1, (1986) 155-179 

8. Smets, P.: The combination of evidence in the Transferable. Belief Model, IEEE 
Transactions on Pattern Analysis and Machine Intelligence 12, (1990) 321-344 

9. Smets, P.: The transferable belief model for expert judgments and reliability prob- 
lems. Analysis and Management of uncertainty: Theory and Applications B.M. 
Ayyub, M.M. Gupta and L.N. Kanal (editors), Elsevier Science Publishers B.V. 
(1992) 

10. Smets, P., Kennes, R.: The transferable belief model. Artificial Intelligence 66 
(1994) 191-234 

11. Smets, P.: The transferable belief model for quantified belief representation. D.M. 
Gabbay and Ph. Smets (eds.) Handbook of Defeasible Reasoning and Uncertainty 
Management Systems 1 Kluwer Doordrecht (1998) 267-301 

12. Smets, P.: The Application of the transferable belief Model to Diagnostic Problems 
Int. J. Intelligent Systems 13 (1998) 127-158 

13. Smets, P.: Practical Uses of Belief Functions. Laskey K. B. and Prade H. (eds.) 
Uncertainty in Artificial Intelligence 15 UAI99 (1999) 612-621 

14. Waltz, E., Llinas, J.: Multisensor Data Fusion. Artech House, Boston, (1990) 

15. L. M. Zouhal and T. Denoeux: An evidence-theoretic k-NN rule with parameter 
optimization. IEEE Transactions on Systems, Man and Cybernetics C, 28 (2) 
(1998) 263-271. 




Coarsening Approximations of Belief Functions 



Amel Ben Yaghlane^, Thierry Denoeux^, and Khaled Mellouli^ 

^ Institut Superieur de Gestion de Tunis, 

41 Avenue de la liberte, cite Bouchoucha, 2000 Le Bardo - Tunis - Tunisia 
byaghlEuiOplcUiet .tn, khaled. mellouliSihec .rnu.tn 
^ Universite de Technologie de Compiegne 
UMR CNRS 6599 Heudiasyc 
BP 20529 - F-60205 Compiegne cedex - France 
Thierry .Denoeux@hds .utc.fr 



Abstract. A method is proposed for reducing the size of a frame of 
discernment, in such a way that the loss of information content in a set 
of belief functions is minimized. This approach allows to compute strong 
inner and outer approximations which can be combined efficiently using 
the Fast Mobius Transform algorithm. 



1 Introduction 

The Dempster-Shafer theory of Belief Functions (BF’s) is now widely accepted 
as a rich and flexible framework for representing and reasoning with imperfect in- 
formation. The concept of belief function subsumes those of probability and pos- 
sibility measures, making the theory very general. Situations of weak knowledge 
and heterogeneous information sources are easily modeled within this theory, 
making it quite suitable in many application domains such as medical diagnosis, 
sensor fusion and pattern recognition [14]. 

This generality, however, has a cost in terms of computational complexity. A 
BF (or, equivalently, a mass function) assigns a number to each of the 2” subsets 
of the frame of discernment 17 (with |17| = n), with 2" — 1 degrees of freedom, 
which is much larger that what is needed to specify a probability or a possibility 
measure. Although BF’s as elicited from experts or inferred from observation 
data are usually constrained to be of a simple form, the fusion of several BF’s 
using the Dempster’s rule of combination almost inevitably increases the number 
of focal sets (i.e., subsets of 17 with a positive mass of belief), resulting in high 
storage and computational requirements for large-scale problems. 

The algorithmic complexity of combining several BF’s has been studied from 
a theoretical point of view by Orponen [10], who proved that the problem is 
complete. Recently, Wilson [16] provided a very complete review of algorithmic 
issues related to the manipulation of BF’s. Currently, two algorithms exist for 
computing the conjunctive combination nii fl m 2 of two mass functions mi and 
7712 (similar methods hold for the disjunctive combination): 

— the mass-based algorithm, initially sketched by Shafer, involves considering 
each focal set A of mi, each focal set B of m 2 , and assigning the mass 
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mi{A)iri 2 {B) to the set An B. Using this method, the combination can be 
performed in time proportional to n\T{mi)\\T{m 2 )\, where T{mi) denotes 
the number of focal sets of (f = 1,2). The time needed for the combina- 
tion of K BF’s TOi, . . . , rriK depends on the particular structure of the mass 
functions, and is at worst roughly proportional to |-^(wi)|, as shown 

by Wilson [16]. 

— the Fast Mobius Transform (FMT) method [8] converts each mass function 
rrii into its associated commonality function qi ; the product of these functions 
is computed, and the result is converted back into a mass function. The 
algorithm takes time proportional to Kn^2^. 

The choice of one of these methods depends on the structure of the mass func- 
tions. As remarked by Wilson, if the number of focal sets of the combined belief 
function is much small than 2”, then the mass-based method is likely to be faster. 
However, this is generally not known in advance. If one of the BF’s has a number 
of focal sets close to 2", then the FMT method is likely to be better. However, 
this method becomes impractical when fl has more than 15 to 20 elements. 

When the combination of several BF’s cannot be computed exactly, one has 
to resort to stochastic or deterministic approximation procedures [16]. Since the 
mass-based method for combining BF’s is the most widely used, most determin- 
istic methods (which are exclusively considered here) have been designed with 
the aim of reducing the number of focal elements. This is true, in particular, for 
the summarization method initially introduced by Lowrance et al. [6], and for 
the more sophisticated methods proposed subsequently [15] [1] [5] [11] [2]. 

In this paper, a different approach is investigated. Instead of reducing the 
number of focal elements, we propose to reduce the size of the frame of discern- 
ment, which can be expected to drastically decrease the computing time of the 
FMT combination method, and can even make it applicable to find reasonable 
approximations in the case of large-size problems. Given a set of BF’s, we pro- 
pose to find a coarsening of the frame 12 that will preserve as much as possible 
of the information content of the belief functions. This approach allows to com- 
pute inner and outer approximations, from which lower and upper bounds for 
the combined belief values can be derived. 

The following section summarizes the background definitions and results 
needed in the sequel. Our approximation method is then described in Section 3, 
and a simulation example is presented in Section 4. 

2 Background 

2.1 Basic Concepts 

The main concepts of evidence theory are only summarized here. More details 
can be found in Refs. [12] and [13]. Let 12 denote a finite set called the frame 
of discernment. A mass function, or basic belief assignment (bba) is a function 
m : 2^ — >■ [0, 1] verifying: 

= 1 - 

Acn 



( 1 ) 
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Each mass of belief m{A) measures the amount of belief that is exactly commit- 
ted to A. A bba m such that m(0) = 0 is said to be normal. This condition will 
not be imposed here. The subsets A of 17 such that m{A) > 0 are called focal 
sets of m. Let tF{m) C 2^ denote the set of focal sets of m. 

The belief function induced by m is a function bel : 2^ — >■ [0, 1], defined as: 

bel(A)= ^ m{B) (2) 

tD^BQA 

for all A C 17. bel(A) represents the amount of support given to A. 

The plausibility function associated with a bba m is a function pi 
[0, 1]) defined as: 

pl(yl)= y; m{B) VA C 17. 

pl(A) represents the potential amount of support that could be given to A. 

Given two bba’s mi and m 2 defined over the same frame of discernment 17 
and induced by two distinct pieces of evidence, we can combine them in two 
ways using the conjunctive or the disjunctive rules of combination [13] defined, 
respectively, as: 



(toi©TO 2)(A) = 


TOi(H)to 2(G) 


(4) 




BnC=A 




(toiOTO 2)(A) = 


TOi(H)to 2(G) 


(5) 



BUC=A 



for all A C 17. The choice of one of these combination rules is related to the 
reliability of the two sources. In fact, if we know that both sources of informa- 
tion are fully reliable, then we combine them conjunctively. However, if we only 
know that at least one of the two sources is reliable, then we combine them 
disjunctively. 

The conjunctive and disjunctive rules can be conveniently expressed by means 
of the commonality function q and the implicability function b, defined, respec- 
tively, as 

ACB 

and 

6(A) = bel(A) -I- m(0) (7) 

for all A C 17. If qi@q 2 denotes the commonality function associated to mi©m 2 , 
and 61062 denotes the implicability function associated to mOm2, we have the 
following simple relations: 

qi@q2 = qiq2 ( 8 ) 

61062 = 6162 (9) 

The importance of this result arises from the fact that the functions to, q and 
6 (as well as bel and pi) are equivalent representations, in the sense that, given 
any of these functions, it is possible to recover all the others. The conversion 
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between these functions can be efficiently done using the FMT algorithm [8] in 
time proportional to n^2" [16]. Relations (8) and (9) provide the basis for the 
FMT-based method for combining BF’s, which consists in transforming the BF’s 
or the bba’s to q or b, computing the product, and converting back the result 
into a mass or a belief function. In contrast, the more traditional mass-based 
approach relies exclusively on Eqs (4) and (5). 



2.2 Coarsenings and Refinements 

In applying the BF framework to a real-world problem, the definition of the 
frame of discernment is a crucial step. As remarked by Shafer [12], the degree of 
“granularity” of the frame is always a matter of convention, as any element w of 12 
representing a “state of nature” could always be split into several possibilities. 
Hence, it is fundamental to examine how a BF defined on a frame may be 
expressed in a finer or, conversely, in a coarser frame. 

Let 17 and 0 denote two finite sets. A mapping p : 2® — >■ 2^ is called a 
refining if it verifies the following properties: 

1. The set {p{{9}),9 € 6>} C 2^ is a partition of 17. 

2. For all A C 6>, we have 

p{A) = U p{{9}) ( 10 ) 

0eA 

Following the terminology introduced by Shafer, the set 0 is then called a coars- 
ening of 17, and 17 is called a refinement of 0. 

Note that defining a coarsening of a frame 17 is formally equivalent to defining 
a partition of 17. Let 0 be such a partition. The function p : 2® — >■ 2^ such that 
P({^}) = ^ lor all 9 G 0, and verifying (10) is a refining of 0, and 6> is a 
coarsening of 17. 

A bba defined on a frame 0 may easily be carried to a refinement 17 
by means of the vacuous extension, which transfers the mass mP{A) to p(A), 
for all A C 0 (in the following, the superscript of a bba will always indicate its 
domain). The resulting bba on 12 is then defined as 

if -B = p(A) for some A C 6) 

\ 0, otherwise. 

The inverse operation, i.e., carrying a bba to a coarsening 0 of 12 is not 
so easy because a refining p : 2® — >■ 2^ is not, in general, onto; there are usually 
subsets A of 12 which are not “discerned” by 0 and, hence, are not equal to p{B) 
for any B C 0 [12]. In order to associate a subset of 0 with each subset A of 
12, an inner reduction 9 and an outer reduction 9 may be defined, respectively, 
as functions from 2^ to 2®, such that: 



9{A) = {9g 0\p{{9}) C A} 
0(A) = {0G0|p({0})nA^0} 



(12) 

(13) 
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for all A C f2. Hence, the mass m^{A) given to A C by a bba can be 
transferred either to 9{A), or to 0{A). This leads to the following definitions: 



mP{B) = 


= E 


m^{A) 


\/Bce 


(14) 




{ACQ,B= 0 (A)} 








frP{B) = 


- E 


m^(A) 


\/B C 0. 


(15) 




{A<za,B=e(A)} 









The bba’s mP and to® will be called, respectively, the inner and the outer 
reduction of (to® is called the restriction of par Shafer [12, p. 126]; the 
definition of to® is, to our knowledge, new). 

To simplify the manipulation of expressions when changing frames, let us 
introduce the following definition. 

Definition 1 Let Oi and i ?2 be two finite sets, (/? an application from 2^^ to 
2^^, a bba on Q\, and to^^ a bba on fli. We say that is the image of 
TO^i by (f, and we note if 

to^^(H) = ^ 



for all A C f22- 

According to Def. 1, the vacuous extension of to® in 17 may be noted = 

p{m^), and Eqs (14) and (15) may be rewritten as to® = 9{m^) and to® = 
9{m^). 

2.3 Inclusion of Belief Functions 

Another notion of interest is that of strong inclusion of bba’s [3] . Let to and to' be 
two BS’s with focal elements T{m) = {Fi, . . . , Fp} and T{m') = {F [, . . . , F'^,}. 
Then to is said to be strongly included in to', or to be a specialization of to' 
(noted TO C to'), iff there exists a non-negative matrix W with entries Wij 
{i = 1, . . . ,p; j = 1, . . . ,p') such that 

p' 

'^Wij =m{Fi), i=l,...,p, (16) 

1=1 

P 

^Wij = m'{F^), j = l,...,p' (17) 

i=l 

and Wij > 0 ^ Fi Q Fj. The relationship between to and to' may be seen as 
a transfer of mass from each focal element Fi of to to supersets Fj A Fi, the 
quantity Wij denoting the part of m^Ff) transferred to Fj. If to C to', then we 
have (with obvious notations) pi < pi' and 6' < b, but the reverse is not true. 
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An approximation m~ (resp. rh~^) of a bba m is called a strong inner (resp. 
outer) approximation if fh~ C m (resp. m C fh~^). Given strong inner and outer 
approximations of several BF’s, it is possible to obtain lower and upper bounds 
for the belief and the plausibility values of the combined BF [3] [2]. Methods 
for constructing such approximations were propose by Dubois and Prade [4] 
in a possibilistic setting, and by Denoeux [2] using an approach based on the 
clustering of focal sets. 

3 Coarsening Approximations of Belief Functions 

In this section, we propose a new heuristic method for constructing strong inner 
and outer approximations of BF’s. Our method consists in finding a coarsening 
0 of the initial frame 42 such that the approximating BF can be represented 
exactly in 0. We first present the basic principle and the algorithm in the case 
of a single BF, and then extend the method to the simultaneous approximation 
of several BF’s. 

3.1 Basic Principle 

Main result. Let denote a bba on 17, 6> a coarsening of 17, p the refining 
from 2® to 2^, and 9 and 9 the associated inner and outer reduction functions. 
Let mP and denote the inner and outer reductions of as defined by Eqs 
(14) and (15), and let mp and be the vacuous extensions of mP and Wp , 
respectively, on 17. We thus have 



II 

o 


(18) 


p{m^) = po 9{m^) 


(19) 



Theorem 1 mP and are, respectively, strong inner and outer approxima- 
tions of : mp C C fn^ 

Proof: We have, by construction. 



m^{A) = 


- E 


m^{B) 


VA C 12 


(20) 




{BCn,A=po0(B)} 








m^{A) = 


- E 


m^{B) 


VA c 17 


(21) 



{BCn,A=po0(B)} 



From Theorem 6.3 in [12, p.118], we have p{9{B)) C B for all B C 12. Hence, 
the mass mp{A) is the sum of masses m^{B) initially attached to supersets of 
A, which implies that mp C . 

Similarly, B C p{9{B)) for all B d f2, which implies that the mass m^{A) is 
the sum of masses m^{B) initially attached to subsets of A, which implies that 
Cm^. QED 
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Matrix representation of bba’s. A very simple construction of mP and 
for a given coarsening 0 can be obtained using the following representation. Let 
us assume that the frame 17 = {wi,...w„} has n elements, and the bba 
under consideration has p focal sets: T{m^) = {Ax, . . . , Ap}. One can represent 
the bba by a pair (m^, F^) where is the p-dimensional vector of masses 
= (m^(Ai), . . . , {m^{Ap)) and is a p x n binary matrix such that 



= AiivP 



1, if u!j G Ai 
0, otherwise. 



where Ap) denotes the indicator function of focal set Ai. 

This representation is similar to an (objects x attributes) binary data matrix 
as commonly encountered in data analysis. Here, each focal set corresponds to 
an object, and each element of the frame corresponds to an attribute. Each 
object Ai has a weight m^(Ai). Since a coarsening is inherently equivalent to a 
partition of 17, finding a suitable coarsening is actually a problem of classifying 
the columns of data matrix F, which is a classical clustering problem (see, e.g. 
[7]). Note that, in contrast, the clustering approximation method introduced by 
Denceux [2] is based on the classification of the lines of F. 

To see how the bba’s mP , rfP , mP , wP can be constructed from F, let us 
denote by P = {/i, . . . , ip the partition of iV„ = {1, . . . , n} corresponding to 
the coarsening 0 = [Ox , . . . , 6c}, i.e.. 



& Ir] r=l,...,c. 



Let (m®,F®) denote the matrix representation of mP . Matrix F® may be ob- 
tained from F^ by merging the columns F;^ for j G 1^, and replacing them by 
their minimum: 



minF,^.- Vi,r 



(22) 



and we have m® = m^. The justification for this is that the focal elements 
of rrp are the sets 6{Ai), and 6 G 6{Ai) iff p{6) C Ai, where p is the refining 
associated to 0. ^ 

similarly, if (nP , F ) denotes the matrix representation of Wp , we have 




maxF,^,- VLr 
jeir 



(23) 



and m® = m^. 

The matrix representations of mp and , the vacuous extensions of rrP 
and rfP , are then obtained as: 

fP = F® Vj G Ir (24) 

F^ = F® Vj G Ir (25) 

and = m^. 
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3.2 Clustering Algorithm 

As shown above, given a coarsening 0 of a frame of discernment 17 and a basic 
belief assignment we can define strong inner and outer approximations rnp 
and Wi^ . It is clear that the quality of these approximations depends on the 
coarsenings considered, then how to choose these coarsenings so as to obtain 
good approximations of 

To answer this question, we propose to use a measure of information allowing 
us to reduce the size of the frame of discernment while retaining as much in- 
formation as possible from the original belief function. Several approaches have 
been proposed to measure the information contained in a piece of evidence [9] . 
Among these approaches, we will use the generalized cardinality [4,2] defined as: 

|m| = ^m(A,)|A,|, (26) 

i=l 

where Ai, i = 1, . . . ,p are the focal sets of to. The bba to is all the more imprecise 
(and contains all the less information) that |to| is large. 

It follows from Theorem 1 and the definition of strong inclusion that 

Ito^I < Ito^I < [to^I 

Hence, a way to keep mP and mP as “close” as possible to is to mini- 
mize the increase of cardinality from to fn^ (which correspond to a loss 
of information), and to minimize the decrease of cardinality from to^ to rnp 
(corresponding to meaningless information). 

More precisely, let us denote by Pc the set of all partitions of in c classes 
(c < n). As shown above, each element of Pc corresponds to a coarsening of 
17 with c elements. The coarsening yielding the “best” (least specific) inner 
approximation corresponds to the partition defined as: 

Pc = arg min A(m^ ,m^) 

with A{m^ ,rrp) = | to ^| — |to^|. Similarly, the partition Pc yielding the best 
(most specific) outer approximation is defined as 

Pc = arg min A(m^ ,m^). 

We are thus searching for the best coarsening over all possible partitions of 
17 into c clusters. Unfortunately, the number of possible partitions is huge, and 
exploring all of them is not computationally tractable. Hierarchical clustering 
[7] is a heuristic approach for constructing a sequence of nested partitions of a 
given set. In our case, this approach will consist in aggregating sequentially pairs 
of elements of 17 until the desired size of the coarsened frame of discernment is 
reached. At each step, the two elements whose aggregation results in the best 
value of the criterion will be selected. 
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More precisely, let (m^,F^) denote the matrix representation of , and 
suppose that we are looking for the coarsening with n — 1 elements corresponding 
to the “best” inner approximation. The aggregation of elements Wj and Wk of 
the frame corresponds to the fusion of columns j and fc of using the mini- 
mum operator. In this process, the number of I’s in each line i of matrix F^ is 
decreased by one if either ujj € Ai and ujk ^ Ai, or LOk G Ai and uij ^ Ai. Hence, 
the decrease of cardinality is 

5{uik,un) = A{m^,mP) = ^ nii|F(^ - F,^| (27) 

i=l 

Note that S{u>k,coi) can be interpreted as a degree of dissimilarity between ujj 
and uji. The hierarchical clustering algorithm can then be described as follows: 

— Given: the bba (m^,F^) 

— Compute the dissimilarity matrix D = (S{uJk,coi)),k,l G {!,..., n} 

— c ^ n 

— Repeat 

• c c — 1 

• find k* and I* such that 5{uik*,uji*) = minfc ; S{uJk,0Ji) 

• construct F'^ with c columns by aggregating columns k* and I* using 
the minimum operator 

• update dissimilarity matrix D 

— Until c has the desired value 

— Compute (m^,F^), the vacuous extension of (m®,F®) 

The computation of outer approximations can be performed in exactly the 
same way, except that the minimum operator is replaced by the maximum op- 
erator. After aggregating columns k and I of matrix F^, the number of I’s in 
each line i of matrix F^ is now increased by one if either utj G Ai and u>k ^ Ai, 
or LOk G Ai and LOj ^ Ai. Hence, the increase of cardinality is 

n 

A{rn^,m^) = '^mi\F^j - F2\ = S{LOk,uJi) (28) 

i=l 

We thus arrive at the same dissimilarity measure as in the previous case, although 
the resulting coarsening is, in general, different. 

Remark 1 Several lines ofF^ or F^ computed by the above algorithm may be 
identical, which means that the number of focal sets has decreased. In this case, 
the binary matrix of focal sets and the mass vector have to be rearranged so that 
the line dimension becomes equal to the number of focal sets. 

Remark 2 As remarked by Wilson [16], coarsening a frame may sometimes 
result in no loss of information. Two elements toj and tOk can be merged with- 
out losing information if 6 {u}j,uik) = 0. Hence, “lossless coarsenings” (using 
Wilson’s terminology) will be found in the first steps of our algorithm, if such 
solutions exist. Our algorithm will even find the “coarsest lossless coarsening” 
as defined by Wilson [16]. 
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Remark 3 Our algorithm is basically the classical hierarchical clustering algo- 
rithm applied to the binary matrix of focal sets. Hence, the time needed to com- 
pute an inner or outer coarsening approximation by this method is proportional 
to n^. 



3.3 Inner and Outer Approximations of Combined Belief Functions 



The approximation method proposed in the previous section can be generalized 
to compute inner and outer approximations of combined belief functions. Rather 
than computing the combination of the original belief functions defined on fi, we 
will compute the combination of their approximations defined over a common 
coarsened frame of 12 using the FMT algorithm [8] . Then the vacuous extension 
defined above will be used to recover the combined belief function on the original 
frame 17 from its approximations defined over the coarsened frames. 

Let mi , be K bba’s defined over a frame of discernment 17 to be 

combined using either the conjunctive or the disjunctive rules of combination. 
Let (m^,F^), k = denote their matrix representations. We wish to 

find a common coarsening 0 = {9i, ...,9c} of 17 that will preserve as much as 
possible of the information contained in each of the K bba’s. For that purpose, 
let us define the following criterion to be minimized for the construction of an 
inner approximation: '^k=i construction of an outer 

approximation: ^^^i ^{rhk ,rn}f). To minimize these criteria, we may simply 
apply the same hierarchical clustering approach as above, to the matrix 






Ffl 



F 



n 

KJ 



and the weight vector = [mp, . . . , m^]' (prime denotes transposition). 



Determining Inner and Outer approximations of the Combined Belief 
Function. Given K bba’s mf , . . . , and mf , . . . , defined over the com- 
mon coarsened frame 0 of 17, we shall proceed as follows to determine strong 
inner and outer approximations of their combination: 

1 . use the FMT algorithm to convert these approximated bba’s to their related 
inner and outer commonality or implicability functions. 

2. compute the approximated inner and outer combined commonality or impli- 

cability functions over the coarsened frame 0. In the case of inner approx- 
imation they are given by: g® = Y\f=i ^ ~ IliLi similarly 

for the outer approximations g® and 6^. 

3. convert back these approximated combined commonality or implicability 
functions to their related inner and outer combined bba’s mP and mP using 
the FMT algorithm. 

4. use the vacuous extension to recover the inner and outer approximated com- 
bined belief function and from and Wp . 
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4 Simulations 

As an example, we simulated the conjunctive combination of 3 bba’s on a frame 
Q with n = 1 17 1 = 30, with 500 focal sets each. The focal sets were generated 
randomly in such a way that element Ui of the frame had probability (z/(n+ 1))^ 
to belong to each focal set. Hence, we simulate the realistic situation in which 
some single hypotheses are more plausible than others. The masses were assigned 
to focal sets as proposed by Tessem [15] : the mass given to the first one was taken 
from a uniform distribution on [0, 1], then a random fraction of the rest was given 
to the second one, etc. The remaining part of the unit mass was finally allocated 
to the last focal set. The conjunctive sum of the 3 bba’s was approximated using 
the method described above, using a coarsening of size c = 10. 




Fig. 1. Simulation results 



A part of the results is shown in Fig. 1. The plausibilities and implicabilities 
pl^(A) and lP{A) are plotted on the x axis against pl^(4) and 5^(4), for 1000 
randomly selected subsets of 17. As expected, we obtain a bracketing of the true 
plausibilities and implicabilities for any A, since pl^(4) < pl^(4) < pi (4) and 

6^(4) > b^{A) > 5^(4). A bracketing of bel^(4) could also be obtained, as 
shown by Denceux [2]. 

5 Conclusion 

A new method for computing inner and outer approximations of BF’s has been 
defined. Unlike previous approaches, this method does not rely on the reduction 
of the number of focal sets, but on the construction of a coarsened frame in 
which combination can be performed efficiently using the FMT algorithm. Joint 
strategies aiming at reducing the number of focal sets or the size of the frame, 
depending on the problem at hand, could be considered as well, and are left for 
further study. 
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Abstract. A framework for modeling with words is proposed based on the idea 
of selecting labels for values from a fixed finite set of terms. In this context 
membership degree is defined as the likelihood that a particular word is deemed 
appropriate as a label for a certain value. A calculus is described for evaluating 
linguistic expressions based on label semantics and this is compared with the 
more typical many-valued logic approaches. Reasoning with label semantics in 
specific contexts is then also considered. 



1 Introduction 

The phrase computing with words was introduced hy Zadeh [11] to capture the idea 
of computation based not on numerical values, hut on natural language terms and 
expressions. The applications of such methods are numerous and diverse. For 
instance, they would facilitate the incorporation of natural language rules, as might he 
provided by human experts, into formal mathematical models of complex systems. In 
addition, it would provide a framework for representing and inferring (pseudo) natural 
language rules from the information stored in potentially large databases as well as a 
means of evaluating hypothesis expressed in natural language on the basis of such 
data. These types of applications are of central importance to the area of data mining 
where a primary objective is the transparency of inferred models. 

The general methodology for computing with words proposed by Zadeh [1 1] is that 
of fuzzy set theory or fuzzy logic [8] and in particular is based on the idea of 
linguistic variables (see [9]). A linguistic variable is defined as a variable that takes 
natural language terms such as large, small, tall, medium ..etc as values and where the 
meaning of these words is given by fuzzy sets on some underlying domain of 
discourse. Hence, a particular expression of the form Bill is tall can be taken as 
expressing the fact that the linguistic variable describing Bill’s height has the value 
tall, and such a statement has a partial truth-value corresponding to the membership 
degree of Bill’s actual height in the fuzzy set representing the meaning of tall. The 
truth- value of compound expressions such as Bill is tall or medium is then evaluated 
according to a fuzzy set calculus based on some choice of t-norm and co-norm. 

In our view the principle problem with the above approach is that the semantics 
underlying standard fuzzy logic or indeed the notion of membership function itself is 
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rather obscure. For instance, it is unclear exactly what information is conveyed by 
such statements as Bill is tall. According to Zadeh [10] the latter provides a flexible 
constraint on the variable representing Bill’s height. More specifically, it tells us that 
the possibility distribution on Bill’s height corresponds to the membership function of 
the fuzzy set tall. However, in our view, the semantics of possibility distributions as 
proposed in [10] are not themselves sufficiently clear for this to provide an adequate 
interpretation. Also, this association with possibility distributions does not, in itself, 
support the assumption of a tmth-functional calculus for membership degrees. Now 
the above doubts about underlying meaning are of great importance given the goal of 
providing a framework to express transparent models of data. In order to understand 
rules or statements we must first be clear about their semantics. Furthermore, if such 
rules are going to be used for prediction in critical applications then some form of 
formal correctness is highly desirable. In the sequel we present an approach to 
modeling with words using the idea of label selection with a clear semantics based on 
random sets and mass assignments (see [5] for an exposition). 



2 Label Semantics 

The fundamental notion underlying label semantics is that when individuals make 
assertions of the kind described above they are essentially providing information 
about what labels are appropriate for the value of some underlying variable. For 
simplicity, we assume that for a given context only a finite set of words is available. 
This is a somewhat controversial assumption since it might be claimed that by 
applying hedges we can easily generate an infinite set of labels from an initially finite 
set of words. In other words, if tall is a possible label for Bill’s height then so is very 
tall, quite tall, very very tall and so on. This claim is problematic, however, for a 
munber of reasons. For instance, it would appear that the use of hedges in natmal 
language is somewhat restricted. One might use the expressions very tall and quite 
tall but very quite tall or even quite very tall are never used. Also, there seems in 
practice to be a limit on the number of times hedges can be applied to a label before it 
becomes nonsensical. This latter point seems to suggest that in practice only a finite 
number of labels may be available even in natural language. Another related difficulty 
with the use of hedges is determining the relationship between the meaning of a word 
and the meaning of any new word generated from it by application of some hedge. In 
Zadeh [9] it is suggested that such relationships have a simple functional form. For 
example, if the meaning of tall is defined by a fuzzy set with membership function 
\x,aii then Zadeh [9] proposes that the meaning of very tall is the fuzzy set with 

membership function The choice of this particular function seems relatively 
arbitrary and indeed, perhaps more fundamentally, it is far from apparent that there 
should be any such simple functional relationship between the meaning of a word and 
a new word generated by application of a hedge. In other words, we would claim that 
while hedges are a simple syntactic device for generating new labels there is no 
equally simple semantic device for generating the associated new meanings. 

Now let us return to the problem of interpreting natural language statements 
regarding, say. Bill’s height as represented by variable H. Let us suppose then that 
there is a fixed finite set of possible labels for H , denoted LA, and that these labels 
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are both known and completely identical for any individual who will make or 
interpret a statement regarding Bill’s height. Given these assumptions how can we 
now interpret a statement such as Bill is tall as asserted by a particular individual / ? 
We claim that one natural interpretation is that it merely conveys the information that, 
according to /, tall is an appropriate label for the value of H. In order to clarify this 
idea suppose I knows that H=h and that given this information he/she is able to 
identify a subset of LA consisting of those words appropriate as labels for the value h. 
This set is denoted jD/ which stands for the description of h given by I. If we allow I 
to vary across some population of individuals V then we naturally obtain a random set 
D ft from V into the power set of LA such that D ^{1)= DI . Given this we can obtain 
higher level information about the degree of applicability of a label to a value by 
defining, in this case, = Pr ^ g \\tall eDft^^ where the latter probability is 

calculated on the basis of some underlying prior distribution on V. This is similar to 
the voting model interpretation of fuzzy sets proposed by Black [1] and Gaines [4] 
although the latter required answering a binary yes/no question about whether or not h 
should be included in the set tall and made no explicit reference to other possible 
labels. Similarly we can determine a probability distribution (or mass assignment) for 

the random set Z)ft by defining VS c LA (S) = V|Dft = Now 

suppose that / does not know the value of H (or alternatively we do not know the 
value assigned to // by /) then they (we) would naturally define a random set D ^ 
from the universe of H into the power set of LA such that Dh{h)=Dl. The 
distribution of this random set will clearly depend on the prior information available 
regarding the distribution of H. Hence, the assertion by I that Bill is tall would in this 
context be interpreted as Finally in the case when we have no 

information regarding I then we can define a random set D fj from the cross product 
of V and the universe of H into the power set of LA such that D u{l,h)= and 
interpret the above statement as {tall}^Du . 



Example 2.1 

Suppose the variable SCORE with universe {1,2,3, 4,5, 6} gives the outcome of a 
single throw of a particular dice. Let LA={low, medium, high] and V 
then a possible definition of D score is as follows: 

D/' =DI^ =D‘^ ={Iow\,D 2 = {low, medium}, D 2 ={low},D2 ={low}, 

- {medium}, - {medium},D{^ = {medium, low}, 

D4 - {medium, high}, D ‘4 - {medium}, D 4 - {medium}, 

Ds = {high},D{^ = {medium, high},D{^ = {high}j)l' ={high\ 

Now assuming a uniform prior on V then the appropriateness degree for each word is 
given by B/oivO) 1/3, 

mediumC^') mediumC^^ 1^ l^mediumC^') Ij l^medium{^') 1/^ 5 



high(.^') 1/ (^) 1^ 1 and also 

>^D^ ={low}'.\, mp^ = {low, medium} .y 3, {low}: 21 3, 
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niD^ = {medium}: 2l3 , {medium, low}: 1/3 

mp^ - {medium, high } : 1/3, {medium } : 2/3 , 

mp^ = ^igh}: 2/3 , {medium, high} : 1/3 , mp^ = {high}: 1 

If we further assume that the dice is fair then the distribution of D score is given by 
mp^coRE ~ {^o^}'- , {low, medium}: )J9 , {medium}: 2/9, {medium, high}: 1/9 

,^igh}: 5/18 

We now consider the problem of how to interpret expressions involving compound 
labels built up using some set of logical connectives. For the scope of this paper we 
consider only the three connectives a, v and — i . This choice was based mainly on the 
fact that in the author’s experience these are the most widely encountered connectives 
in the type of application discussed in the introduction. However, we freely admit that 
these are also the most straightforward connectives to define in the context of label 
semantics. Firstly let us consider tbe case of negation. How do we interpret 
expressions of tbe form Bill is not talll We take the view here and in the sequel that 
negation is used in this case to express the non- suitability of a label. In other words 
the above statement means that tall is not an appropriate label for H, or . 

Conjunction and disjunction are then taken as having the obvious meanings so that 
Bill is tall and medium is interpreted as saying that both tall and medium are 
appropriate labels for H, {tall, medium} <^0^ , and Bill is tall or medium is 
interpreted as saying that either tall is an appropriate label for H or medium is an 
appropriate label for or {medium} In the following section we 

introduce a formal framework based on the ideas described above. 



3 Formal Framework 

Consider a language L of the predicate calculus consisting of the set of unary 
predicate symbols LA - {l^,---,!^}, a single variable x, a set of constants ranging 
across a domain of discourse Q and connectives a, v and -■ . 

Definition 3.1 (Generai Labei Expressions) 

The set of general label expressions of L, GLE, is defined recursively as follows: 

1. Li(x)G GLE for i =1, •••, n 

2. If 0,(|) s GLE then — 19, 9 a (),9 v (|) s GLE 

Definition 3.2 (Specific Label Expressions) 

The set of specific label expressions for L, SEE , is defined by 
SEE = -^(fl)|0(x)e GLE, a e Q ]■ 

Definition 3.3 (Appropriate Label Sets) 

Every 0 e GLE is associated with a set of subsets of LA (i.e. an element of 2 ), 

denoted ^(0), where ^(0) is defined recursively as follows: 
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1 . ?i(l,(x))= {Sc L/!|{l, }c 5} 

2. ?i(e A(^)=?i(e)nX((^) 

3. ?i(ev^) = A(0)u?.((^) 

4. ?i(e)=^) 

Intuitively ?i(0) corresponds to those subsets of LA identified as being candidates for 
the set of appropriate labels for x (i.e. possible values for D j; ) by expression 0 . 

We now introduce some basic notation. Let Vala denote the set of valuations (i.e. 
allocations of truth values) on -[Li(fl^---,L„(a)]- for a gQ. . For veVala 
v{Li(a))- true can be taken as meaning that L, is an appropriate label for a. Let 
SLE^ = {0(fl)|0(x)e GLL},5LL“ = {Ly{a\-, L„(a)} and 
SLE^^ * = see" u ■[— icp(a), (p(a) a ()(«), cp (a) v () (a)|tp(a), ()(«) e SLE^ j' 

Clearly we have that SLE^ = \^SLE" . From a valuation v on |Li(a^---,L„(a)} the 

n 

truth-value, v(0), for 0 gSLE^ can be determined recursively in the usual way by 
application of the truth tables for the connectives. 

Definition 3.4 

Let X : Vala — ^ 2“ such that Vv e Vala 't(v) = {aI v(Lf(fl))=f} (Here t=true and 
f=false). 

Notice that x is clearly a bijection. Also note that for v e Vala '^(^) can be associated 
with a Herbrand interpretation of the language L restricted to the single constant a 
(see [7]) 

Lemma 3.5 

V0(a) € SLEa {x(v^v e Va4,v(0(fl))= t j-= L(0) 

Proof 

We prove this by induction on the complexity of 0(a) . 

Suppose Q(a)e SLEa , so that 0(a) = L; (a) for some i£{l, •••,«}. Now as v ranges 
across all valuations for which Lj (a) is true, then x(v) ranges across all subsets of LA 
that contain L, . Hence, -[x(v)|v € Vala, ^(A ('^)) = ? C CA|{c,}c 5 j-= L(Lj) as 
required. 

Now suppose we have V0(a) € .SLL" {x(v)|v € Va/^, v(0(a))= t j-= L(0) and 

consider an expression 0(a) e then either 0(a) e^LL" in which case the 

result follows trivially or one of the following hold: 

(1) 0(a) = (|)(a) A (p(a) where (|)(a),(p(a) € SLE^ . In this case 
€ La/a|v(<|)(a) A (p(a)) = r j-= jv e ya/a|v((|)(a)) = jv e Va/a|v((p(a)) = r j-. 
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{x(v)|vey<,v((l)(a)A(p(a)) = f}= {x(v)|v e Vala, v(([)(fl)) = t jo 
jx(v)|v e Vala, v((p(a))= f j= ^(([))n?i((p) (inductive hypothesis) = X(4> a 0) hy 
definition 3.3. 

(2) 0(fl) = (|)(fl) V tp(a) where ([)(«), tp (a) e SLE" . In this case 

€ Val}v{if{a)w cp(a)) = {v € Val}v{^{aj) = tju {v € Va/Jv((p(a)) = f }. 

^ Valoj v(t|)(a) V (p(a)) = t}= {'t(v)|v e Vala, v(()(a)) = t j<J 
jx(v)|v e v((p(fl))= t j= ^((t))u?i((p) (inductive hypothesis) = ^((|)V0) by 

definition 3.3. 

(3) 0(fl) = — i4*(a) where c[)(fl) e SLE^ . In this case jx(v)|v e Val^, v(— .(())= t j-= 
{x(v)|v e Vala, v(4>) = l}= ^(4>) (by the inductive hypothesis)= A,(-i()) by def. 3.3 

Proposition 3.6 

Va eQ 0(aj=<|)(a) iff )i(0)c ?i(<|)) 

Proof 

(=>) For some arbitrary a e Q 0(a)|= =>{vey<|v(0(fl))=tj 

£ {v e ya/„|v((t)(a))= t}=> {x(v)|v e Vala, v(9(a))= t} 
c {x(vjv 6 Vala, ^(4’(<^)) j=^ ^(0)£ ^((|)) by lemma 3.5 

(<=) Suppose ^(0) £ ?i((|)) . For every a eQ ^(0) = {"tCv jv € Vala j ^(9('^)) - ^ j ^nd 
)i(()) = {x(vjv e Vala,v((t)(a))= t j by lemma 3.5. Therefore, 

{x(v)|v e Vala, v(0(a)) = t ]e {x(v)|v e Vala, v(([)(fl)) = t] 

^ jt' € ya/a|v(0(a)) = t j£ E Vala |v(<|>(a)) = t j since X is a bijection. 

Corollary 3.7 

For 0(x), (t)(x)6 GLE Va eQ Q{a)=(^(a) iff X(0)= ^(4>) 

We now introduce the notion of an interpretation in label semantics. 

Definition 3.8 (Label Interpretation) 

A Label Interpretation, /, of language L is defined as follows: 

1 . To each constant a e Q , we assign a subset of LA, denoted D J 

2. To variable x we assign a random set D ^ into 2^^^ . 

Given a label interpretation / tbe meanings of label expressions are then determined 
according to the following semantic rules. 




380 



J. Lawry 



Semantic Rules 

For 0 e GLE then under I, 9 is interpreted as meaning D[ s A(0) . For 0(a ) g SEE 
then under I, 9(a) is interpreted as meaning DJ g A.(0). Obviously this has a clear 
binary truth- value since bothZ)/ and ^(0) are precisely defined. 

Proposition 3.9 

If (|)(a) G SEE is inconsistent then it is false under all label interpretations. 

Proof 

If ^{a)^SEE is inconsistent then (|)(a) = 0(a) a - i9(a) so that by corollary 3.7 
^((|)) = ^(9 a— 10) Let / be a label interpretation the under I, then (|)(a) means 

D a s ^0 “'9) = ^(0) ^(—10) = ^(0) ?t(0) = 0 by definition 3.3 

We now introduce higher-order functions describing the selection of labels across a 

set of label interpretations. 

Definition 3.10 (Label Appropriateness Measure) 

Let V be a set of label interpretations of L with associated prior distribution Py then 
we define 

ys ^EAmo^iS)^ Py ({/eV|Di=5}) 

In the case where V is finite and Py is the uniform distribution this corresponds to 

yS^EAmD^(S) = 

From this mass assignment we define the appropriateness measure p by 

V9 G GEE, Va G Q pe (a) = X (>^) 

SeX(0) 

Trivially, by proposition 3.6 we have that if Va gQ 9(a)|=<|)(a) then 
Va G Q pe (a) < p<^ (a) and similarly by corollary 3.7 we have that if 
Va gQ 0(a) = (j)(a) then Va gQ Pe(a) = p^(a). 

Proposition 3.11 

V9 G GEE, \faGQ p^e(a) = 1 - Pe (a) 

Proof 

F^e(a)= 2, mo^{S)= (■^)= 1- 2 = 

SeX(-,0) SeX(e) S’eX(0) 

In order to consider the behavior of the p operator on conjunctions and disjunctions of 
labels we introduce the notion of consonant mass assignments. More specifically, we 
will assume that niQ is consonant for all a g£ 2 . Here, consonance has the standard 

random set meaning (see [5]) that V5, S' <^EA if both (S)> 0 and niQ {S')> 0 

then either S c S' or S' ^ S . This assumption may seem, on first inspection, very 
strong. However, if we think of each label interpretation as corresponding to a voter 
then consonance simply requires the restriction that voters in V differ regarding the 
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composition of D/, only in terms of its generality or specificity. In Lawry [6] it is 
proposed that voters decide on the truth-value of statements solely on the basis of 
some optimism parameter. In the current context this would mean that the higher the 
value of this parameter for I, the more likely that Lj for any particular i will be 

included inD’. The framework presented in [6] is problematic since it assumes that 
truth- values are dependent only on the optimism parameter. Hence, for sentence 9 , 
the most optimistic voters regarding 9 will also be the most optimistic voters 
regarding -i9. This is counterintuitive and requires the introduction of a weaker 
notion of negation. In label semantics the negation problem is avoided since for label 
expression 9 , — 19 is not interpreted as a positive assertion at all but rather as mling 
out certain subsets of LA for the value description. 

Proposition 3.12 

If for all a eQ is a consonant mass assignment (see [5]) then for Li,Lj s LA 
we have that Va e Q p, , ^ (a) = min (a ), p (a)^ 

Proof 

Notice X(Lj a Lj L(L,)n c LA|jLj|c 5^ 

= 45 c LA|-^, , Ly 5 T Hence, Va eQ X W 

For any a since mp^ is a consonant mass assignment then it must have the form 
mp = Mj : mi,- : nif- where M,- <z M;+i for i = 1, • • •, L - 1 

Now suppose w.l.o.g. that p/, (a) < (a) then {L;}cM; iff ^L;, Ly J-c M,- for 

= 1 , - • A: . Therefore, (a) = X mp (S) = ^ p (a) = min (p, , (a), p ^ (a)) 

' ' S:{L,}cS “ ' ^ ^ 

Proposition 3.13 

If for all a eQ mp is a consonant mass assignment then for Li,Lj e LA we have 
that Va E £2 p^ ^ / . (a) = max (p/^ (a ), p / . (a)^ 

Proof (Similar to proposition 3.12) 

In order to compare and conPast label semantics with the many-valued logic 
approach to fuzzy reasoning we first give a formal definition of what is meant for a 
calculus to be fully truth-functional. 

Definition 3.14 (Fuiiy Truth-functional) 

Let CO : GLE x Q ^ [9, l] then co is said to be fully truth-functional if and only if 
there exist functions /_, :[9,l]^r0,l], A: [9, if ^[ 9 , 1 ] and A : [9, if ^[9, l] 
such that V9, (|) E GLE, VaeQ co^e(a) = A (® e(<3)), (a ) = A (®e (a), ®<|> (a)) 
coev^(a) = A (a)) where (Oe(a) is used as shorthand for co(9, a) . 
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Notice that propositions 3.12 and 3.13 do not imply that V9, <t)eGL£, 
M 9 A(|)(a) = »*j«(^e(fl)M(|)(a)) or that |tev(|)(a) = For instance, 

assuming consonance, it can easily he seen from definition 3.10 that (a) = 0 

if p /, (a) < p / . (a) and p (a) - p / . (a) otherwise. This is not generally equivalent to 

min(^i (a),\ -p^ (a)^. Nonetheless, if we assume consonance there is a sense in 

which the calculus of p is functional, although not fully truth-functional, since for any 
0 gGLE we have from definition 3.10 that pe(a) is completely determined by the 
mass assignment and provided the latter is consonant then it in turn is completely 
determined by the values of P/,. (a) for i = 1, •••, n (i.e. this is due to the fact that a 

consonant mass assignment is completely determined by its fixed point coverage [5]). 
A pleasant consequence of this is that the meanings of all label expressions can be 
defined by a set of appropriateness measures p^, :Q^[0, l] for /=!, •••,«. One 

possible method for calculating Pe(a) for general 0 eGLE is as follows: By the 
disjunctive normal form theorem we have that 0(a) is logically equivalent to a 

disjunction of atoms V a where each atom is a conjunction of literals of the 
a:a^0 

form a = A± £,. Now it can easily seen that X(a) is a singleton consisting of the 
i 

subset of LA made up from those labels appearing positively in a . Also by definition 
3.3 and corollary 3.7 we have that ?i(0)= l^A,(a) and hence 

a:a->6 

Be(a)= 'ZmD^(X(a)y. 

a:a^0 

It should also be noted that p satisfies the law of the excluded middle in the sense 
that Va gQ,V0 eGLE peA-,e (<3:) = 0 as follows immediately from proposition 3.9. 
This does not contradicted the triviality result of Dubois and Prade [2] (later Elkan 
[3]) which states that any fully truth-functional logic, in the sense of definition 3.14, 
satisfying the law of the excluded middle can only be binary, since the calculus for p 
is not fully truth-functional. More specifically, as we have seen there is no single 
binary function such that V0, t|) s GLE,ya s Q pe^^ (a)= ^(pe (a), P;^ (a)). 

It should be commented that we do not see the failure of label semantics to satisfy 
the fully-truth functional property as in anyway detrimental. The consonance 
assumption means that the p function is completely determined by its values on the 
labels and hence computational feasibility is maintained. Furthermore, in our view, 
full truth-functionality is a somewhat naive assumption since it does not take into 
account the logical structure of the expressions involved when combining them. 



^ We are abusing notation slightly here and taking A, (a) to correspond to the single element of 
2 ^ associated with a rather than the set containing that element. 
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4 Context Specific Reasoning with Label Semantics 

In this section we discuss some of the issues associated with reasoning based on label 
semantics in a specific context. For instance, what additional information can be 
gained and utilised from specific knowledge of a particular p function. In order to 
gain some insight into this issue consider the following observation. Suppose we have 
a set of label interpretations V together with an underlying distribution Py , then let us 

refer to the pair {v,Py) as a label frame. Now clearly from definition 3.10 mp and 
pare defined only relative to some fixed frame. In the specific context of such a 
frame we are likely to observe that only a proper subset of the set of label sets is in 
fact possible. For instance, given LA = {small, medium, large} we may find that in 
some frame S only the following occur as sets of possible labels: {.small}, {small, 
medium}, {medium}, {medium, large}, {large}. We can formalize this observation by 
defining the set of focal elements for a frame S as 

ft = £ LA|3fl e Q mo^ {S) > 0 j. Assuming consonance we can determine ft from 

examination of the associated p function. We can now make the following natural 
definitions in the context of frame S . 

Definition 4.1 

(i) (Universally follows from in S ) For 0, ([) e GLE ([) universally follows from 0 in 
frame B (denoted 0|==<|)) iff ?i(0)nft c A.((|))nfts 

(ii) (Universally equivalent to in E ) For 0,(|) e GLE 4> is universally equivalent to 0 
in frame E (denoted i|) =s 0) iff ^(([))nft = A(0)nfts (e.g. in the frame mentioned 
above small(x)A medium(x)=s small{x) Amedium(x) A^large(x)) 

(iii) (Universally true in E ) For 0 e GLE 0 is universally ttue in E (denoted |==0 ) 
iff A-(0)nft = ftn (e.g. in the frame mentioned above {small{x)A—<medium{x)) 
v(^small{x) A medium{x'j)\/ (medium(x) a -nsmall{x)} s/{medium(x) a large(x)} 
vQarge(x) A^medium(x)) ) 

A common example of (i) in the above definition is when a certain label is 
conceptually implied by another label. For instance, we might say that whenever 
someone is described as being very tall then they can also be described as tall. In 
fuzzy set theory this would be captured by taking the fuzzy set for very tall as a fuzzy 
subset of the fuzzy set for tall (see [8]). In label semantics we would expect to have a 
frame E in which whenever veo’ tall was deemed an appropriate label so was tall. In 
other words, very tall{x}=^tall{x). In such a case it is not difficult to see that from 

definition 3.10 we have Va gQ }iyery taii(a) ^ }i-tati(a) so that in this instance fuzzy set 
theory and label semantics would coincide. 
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5 Conclusions 



In this paper we have presented a formal framework for modeling with words based 
on the idea of label selection. In this framework label expressions give information 
about what words are appropriate to label a value. A higher level measure of label 
appropriateness can be defined based on the variation of the definition for the set of 
appropriate labels across a set of label interpretations. Given a consonance 
assumption the values of this measure can be determined completely from its values 
on the basic labels. We have also introduced the idea of a label frame and described 
how such a notion helps us to reason with label semantics within a specific context. 
Overall we would claim that label semantics provides a coherent and computationally 
feasible calculus for modeling with words. 
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Abstract. In this paper, we first investigate set semantics of proposi- 
tional logic in terms of rough sets and discuss how truth values of propo- 
sitions (sentences) can be interpreted by means of equivalence classes. 
This investigation will be used to answer queries that involve general 
values of an attribute when the actual values of the attribute are more 
specific. We then explore how binary relations on singletons can be ex- 
tended as set-based relations, in order to deal with non-deterministic 
problems in an information system. An example on test-case selection in 
telecommunications is employed to demonstrate the relevance of these 
investigations, where queries either contain values (concepts) at higher 
granularity levels or involve values of an attribute with non-deterministic 
nature or both. 



1 Introduction 

In rough sets, information and knowledge is usually represented using data tables 
or decision tables [9]. Each column in such a table is identified by an attribute, 
which describes one aspect of the objects being processed in an information 
system. Attribute values can be defined at different granularity levels, according 
to a specific requirement. In the past, most of the research work using rough 
sets has focused on how to use or manipulate or discover knowledge from the 
information carried by a table, under the assumption that attribute values in 
such a table have been chosen at the right granularity level (e.g., [10], and [11]). 

Nevertheless, problems of granules of attribute values in query processing 
have been addressed by some researchers. In [5], a high level data table is derived 
from a lower level table when the concept required by the query is not matched by 
the values of the relevant attribute in the lower level table. In this case, the values 
of the attribute in the higher level table are replaced by more general values 
(concepts). In [12], approaches to answering non-standard queries in distributed 
information systems were explored. Values of an attribute at different granule 
levels are arranged as nodes in a tree structure, with the attribute name as the 
root and the most specific values of the attribute as the leafs. In contrast to 
the two approaches above, in [1], rough predicates are defined. These predicates 
associate user-defined lower and upper approximations with attribute values, or 
with logical combinations of values, to define a rough set of tuples for the result 
of the predicates. Each predicate, similar to the definition of a function on an 
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entity in a functional data model, does not define an attribute as a function on 
a relation rather it chooses a possible value of an attribute as a function of the 
relation. For example, if relation Horse has an attribute Age, then a predicate 
young (Horse) is defined with the lower approximation containing those horses 
with age below 1, and the upper approximation consisting of those horses with 
ages in {2,3,4}. 

Another common problem associated with attribute values in a data table 
is that an object has a set of possible values instead of just one value for a 
particular attribute. Although only one of the values is surely true but we cannot 
say (determine) precisely which value it is yet. When a data table involves this 
kind of attributes (non-determinictic), it is necessary to extend usual binary 
equivalence relations to be set-based relations (e.g., [13]). 

In this paper, we aim at solving these two problems when a query either 
contains values (concepts) at higher granularity levels or involves values of an 
attribute with non-deterministic nature or both. We discuss how to reason about 
knowledge at different levels from a data table using the combination of logic and 
rough sets, when values of an attribute are given at the most specific level, in 
order to answer queries. To achieve this objective, we investigate set semantics of 
propositional logic first in rough sets and then explore the relationships between 
equivalence relations and propositions in terms of partitioning a data table. We 
then discuss how extended set-based binary relations can be applied to com- 
pute tighter bounds when the values of an attribute are non-deterministic (set- 
based) [6] . An example on test-case selection in telecommunications is employed 
to demonstrate the relevance of our research result. The paper is organized as 
follows. Section 2 introduces the basic notions of rough sets and set based com- 
putations in non-deterministic information systems. Section 3 explores the set 
semantic of propositional logic. Section 4 discusses how to apply the results to 
solve complex queries. Finally Section 5 summarizes the paper. 



2 Deterministic and Non-deterministic Information 
Systems 

Basics of rough sets: Let C/ be a set, also called a universe, which is non-empty 
and contains a finite number of objects (this assumption will not lose general 
properties of rough sets), and R be an equivalence relation on U . An equivalence 
relation is reflexive, symmetric and transitive. An equivalence relation R on U 
divides the objects in U into a collection of disjoint sets with the elements in the 
same subset indiscernible. We denote each partition set, known as an equivalence 
class, as and an element in as w(j. The family of all equivalence classes 
{W^,...,W^} is denoted as U / R. and are simplified as Wi and wij 
respectively when there is no ambiguity about which equivalence relation R we 
are referring to. 

Given a universe, there can be several ways of classifying objects. Let R and 
R' be two equivalence relations over U, RD R' is a refined equivalence relation. 
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n can be understood as and. The collection of equivalence classes of R(1 R' is 

u/{R n R') = {Wi^ n wf I g u/r, wf g u/r\ n wf ^ 0}. (i) 

Equivalence relation i?i fl i ?2 H ... fl i?„, from TZ = {i?i, ..., i?„} on a universe 
U, is usually denoted as IND(TZ) [9]. 

Definition 1. Structure {U, Q,Va)a^n is called an information system where: 

1. U is a finite set of objects, 

2. Q is a finite set of primitive attributes describing objects in U, 

3. For each a € f2, Va is the collection of all possible values of a. Attribute a 
also defines a function, a : U ^ Va, such that Vu € U, 3x € Va, a(u) = x. 

Such an information system is also called a deterministic information system. 
For a subset Q & fi, function Rq defined by UiRqU2 -^ffa € Q, a(ui) = 0 (^ 2 ) is 
an equivalence relation. When Q consists of only one attribute a, i.e., Q = {a}, 
Rq is called an elementary equivalence relation. In [9], each equivalence class in 
an elementary equivalence relation is referred to as an elementary concept. Any 
other non-elementary equivalence relation Rq can be represented by a set of el- 
ementary equivalence relations using the following expression, Rq = r\a^QR{a}- 

Definition 2. Let U be a universe and R be an equivalence relation on U . For 
a subset X of U, if X is the union of some then X is called R-definable; 
otherwise X is R-undefinable. 

Let TZ = {R\, ...,Rn} be a collection of n equivalence relations. Then any 
subset X C U obtained by applying fl and U to some equivalence classes in 
any C//R (where R C 7?.) is /A^D(7?.)-definable. This statement identifies all the 
concepts that are definable under equivalence relation IND{TZ). For any subset 
X of U with a given R, we can also use two subsets of U to describe it as follows: 

RX = U{Wi^ I Wi^ c X}, RX = U{Wi^ I n X yf 0}. 

When X is i?-definable, RX = RX = X. Subsets RX and RX are called 
R-lower and R-upper approximations of X. 

Set-based computation: In real world applications, not all attributes in a data 
table will be assigned with single values against individual objects (e.g., [6], [8]). 
The definition below defines those information systems where an attribute can 
have a set of values for a particular object. 

Definition 3. Structure {U, Q, Va)a^n is called a non- deterministic information 
system where: 

1. U is a finite set of objects, 

2. Q is a finite set of primitive attributes describing objects in U, 

3. For each a € f2, Va is the collection of all possible values of a. Attribute a 
also defines a function, a :[/—>■ 2^ such that Vm G U, 3S C 2^“, a(u) = S. 
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Table 1. A sample data table 



u 


Manufactures 


Color 


Weight (g) 


Age-Group 


Ml 


{{UK, France}, {UK, Japan}} 


{grey, black} 


{1,2} 


{infant. Toddler} 


M2 


{{Japan, Korea }, {UK, Japan}} 


{grey} 


{2,3,4} 


{Toddler, Pre-school} 


Ms 


{{France, Germany}} 


{grey} 


{2,3,4} 


{Pre-School} 


M4 


{{Japan, Germany}} 


{grey, brown} 


{1} 


{All} 


M5 


{{UK, France}} 


{brown} 


{4.5} 


{Teenager} 



Table 1 shows a non-deterministic information system (also called an at- 
tribute system in [3]) with all attributes non-deterministic. Attribute Manu- 
factures is even more complicated: each object is assumed to be manufactured 
jointly by two countries. When we don’t know for sure which two countries man- 
ufactured a specific object, we assign several pairs of possible countries, such as 
for ui- In [3], four possible explanations of the values in a(u), a set assigned to 
an object against a particular attribute, are provided. We supplement 5th expla- 
nation on top of that to cover the situation as shown by attribute Manufactures. 
These five explanations are: 

(1) a{u) is interpreted disjunctively and exclusively: one and only one value 
is correct, such as the weight of an object (assume we use a closest integer to 
measure the weight of each object), 

(2) a{u) is interpreted disjunctively and non-exclusively: more than one value 
may be correct, such as the (suitability of) age groups of a toy, 

(3) a(u) is interpreted conjunctively and exclusively: all the correct values 
are included, such as the color of a toy (when we list all the colors involved), 

(4) a(u) is interpreted conjunctively and non-exclusively: all the values (but 
not limited to the values) in a(u) are correct, such as the color of a toy (when 
we list main colors only), 

(5) the combination of (1) and (3): one and only one value (subset) is correct 
and this value is the combination of individual values, such as the manufactures 
of a toy. 

For the first 4 categories, set-based operations are enough to deal with at- 
tribute values. However, for category 5, we will need to use interval-based oper- 
ations, since each value itself is again a set. 

Definition 4. (from [13]) Let r he a binary relation on Va, a set of possible 
values of attribute a. A pair of extended binary relations (r*,r*) on 2^“ \ 0 is 
defined as: 

Ar^B (Va G A,yb G B) arb, Ar*B (3a G A, 3b G B) arb. (2) 

Let Q be a query that involves values in subset B of Va, then retrieval sets 
Rett.(Q) = {ui I a(ui) = A, Ar*H}; Ret*(Q) = {ui \ a(ui) = A,Ar*B}, 



give the lower and upper approximations of a set of objects that support query 
Q under condition B. For example, if Q = ‘select grey objects’, and we set B = 
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{grey} and r be ‘ =', then Ret^{Q) = {^2,^3} and Ret*{Q) = {mi, M 2, M3, U4}. 
However, Eqs. in (2) cannot be used to deal with values for attribute Manu- 
factures because there may be several subsets of values assigned to an object. 
Therefore, we need to further extend the equations. 

Given two sets Hi, H2 G 2^“ with Ai C A2, set A defined hy A = [Hi, H2] = 
[X G 2'^“, Hi C X C H2} is called a closed interval set. 

Definition 5. Let A and B he two interval sets from 2^“ . H pair of extended 
binary relations on 2^ “ \ 0 zs defined as: 

AD^ B G Ayr G B X ay, 

AA* B -^3X G A,3 Y G B X ay. 

Let (5 be a query that involves conditions described in interval set B = [Bi, B2] C 
2^“ \ 0, then two retrieval sets Ret^.{Q) and Ret*{Q) defined by 

Rett.{Q) = [ui, I a{ui) = H, H B}; Ret*{Q) = [ui, \ a{ui) =A,AA* B}, 

(3) 

give the lower and upper approximations of a set of objects that support query 
Q. For instance, if query Q says ‘select UK manufactures related objects’ and 
we set B = [{UK}, {UK}] = {{UK}} C 2^™“"“ \ 0, then Rett.{Q) = {mi,M 5} 
and Ret*{Q) = {mi,M2,M5}. 

3 Set Semantics of Propositional Logic 

A deterministic information system can be best demonstrated using a data ta- 
ble in rough sets. Each data table contains a number of rows labelled by objects 
(or states, processes etc.) and columns by primitive attributes. Each primitive 
attribute is associated with a set of mutually exclusive values that the attribute 
can be assigned to. Each attribute also defines an elementary equivalence relation 
and each equivalence class of the relation is uniquely identifiable by an attribute 
value. When an attribute can choose values from different value sets, only one 
of the possible value sets will be used in a particular data table. Each equiv- 
alence class in a partition is also naturally corresponding to a concept which 
can be characterized by a proper proposition. In the following, if we take P, 
P = {^11^2) as a finite set of atomic propositions, then as usual C{P) is 

used to denote the propositional language formed from P. C{P) consists of P, 
logical constants true and false, and all the sentences constructed from P using 
logical connectives {-i,A,V,— >-, gg} as well as parentheses (,). 

Definition 6. Let U he a non-empty universe with a finite number of objects, 
P he a finite set of atomic propositions. Function val : U x P ^ {true, false} is 
called a valuation function, which assigns either true or false to every ordered 
pair (m, q) where u G U and q G P. 

val{u,q) = true, denoted as u [=5 q, can be understood as q is true with respect 
to object u in S, where S = {U, L2,Va)aeO is an information system. Based on 
val, another mapping function m : P — >■ 2^ can be derived as: 

v{q) = {u\uGU,u\=s q}. 



(4) 
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where u € v(q) is interpreted as q holds at state u (or is proved by object u). 
Function v can be extended to a mapping v : C{P) — >■ 2^ as follows. For any 
& C{P), 

v{(t> A tp) = {u \ u & U, {u \=s (p) and {u \=s ip)}, (5) 

v{(p W Ip) = {u \ u G U, (u (p) or {u \=s V')}, (6) 

vh(p) = {u\uGU,u^s (p}- ( 7 ) 

Therefore, the subset of U containing those objects supporting formula (p (non- 
atomic proposition) can be derived through the initial truth assignment val. An 
atomic proposition can be formally defined as: there exists one and only one 
attribute a G L2 in an information system (U, f^,Va)aeO, such that there exists 
only one x, x G Va, v{q) = {u \ a{u) = a;}. 

Definition 7. Let {U, R, P,val) he a structure where R is an elementary equiv- 
alence relation on U, P is a finite set of atomic propositions, and val is an 

valuation function on U x P. If there is a subset P' = {q\, of P such that 

U/R = {v{qi),v{q 2 ), ■■■,v{qn)} holds, then subset P' is said to be equivalent to 
R, denoted as U/R = v{P'). 

v{P') is defined as a collection of subsets of U, i.e., v{P') = {f(gi), v{q 2 ), 
v{qn)} for all qi G P' . This definition suggests that there can be a subset of a 
set of atomic propositions P which is functionally equivalent to an elementary 
equivalence relation in terms of partitioning a universe, regarding to a particular 
aspect (attribute) of the objects in the universe. 

Table 2. A sample test case data table 

U ID Engineer Feature Purpose 
Cl 408 N Ross STM-40 
C2 356 N Ross STM-lo Undefined 
C3 228 T Smith Connections Undefined 

C4 175 T Smith Protection {{Forced Path Protection Switch is successful 
Switching when Standby Path is faulty}, 

{Pass criteria: Path Protection to the Standby 
Path occurs}, 

{Fail criteria: Path Protection to the Standby Path} 
not occur}} 

cs 226 T Smith Synchroni- {{STM-N/ESI ports added to the SETG priority list, 
sation Ensure ports not logically equipped not added}} 

C6 214 none 2Mbit/s Undefined 
C7 48 N Ross STM-40 

Cs 50 N Ross STM-lo {{Can configure Alarm Severity of Card Out, 

Default value of Severity is Minor}, 

{When Severity is changed Alarm should raise}} 
cg 72 N Ross STM-lo {{Can display card type, Card variant, and 

Unique serial No}, 

{Otherwise, Alarm should raise}} 

cio 175 P Hay STM-lo {{HP-UNEQ Alarm raised when C2=00 5 times}, 

{Alarm not raised when C2 is set 00}} 
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Example 1. Assume C/ is a universe containing 10 simplified snap-shot of test 
cases in telecommunications (Table 2). Let R be an equivalence relation on U 
which divides U into three disjoint sets, one with those cases for which the 
value of Purpose is empty, one with Purpose Undefined, and one with Pur- 
pose Defined (if the details of Purpose of an object are given, we say it is de- 
fined). Similarly, relation R' , which divides U into six disjoint sets based on the 
names of Feature, is also an equivalence relation. The equivalence classes gen- 
erated by R and R' are: U/R = {{ci, cy}, {c2, C3, ce}, {c4, C5, cs, cg, cio}} and 
U/R {{oi , Cy}, {cg , Cg , Cg , Cio} , {cg}, {04} , {c5 }, {cg}} . R R{Purpose} ^^id 
R' = R{Feature) elementary equivalence relations, but RC\R' is not. Let 
qi, ..., Qq be six atomic propositions, ‘A test case has feature STM-fo’, ..., ‘A 
test case has feature 2 Mbit/s’ respectively, these six atomic propositions divide U 
into six disjoint subsets: v{qi) = {ci,cy}, ^(gg) = {cg, cg, cg, cio|, v^qs) = {cg}, 
v{q4) = {C4}, ^(gs) = {cs}, and ^(ge) = {cg}, where v{qi) = for i = 1, ..., 6. 
Therefore v{P') = U/R. 

Definition 8. Let {U,R,P,val) he a structure defined in Definition 7. For a 
formula (j> in C{P), if v{ 4 >) defined in Eq. (f) is R-definable then f is said to 
be an R-definable formula. Otherwise, is R-underfinahle. Formulae true and 
false are always R-definable with vftrue) = U and v(false) = 0 . 



Theorem 1. Let {U, IND{TZ), P, val) be a structure defined in Definition 1 with 
TZ = {i?i, i?2, Rn} containing n elementary equivalence relations on U . When 
U/Ri = v{Pi) holds for i = 1, ...,n and Pi C P, every formula in C{P') (P' = 
LliPi) is an LN D{TZ) -definable formula. 

Example 2. Let U he a, set of objects containing a group of 10 test cases as 
given in Table 2 . Let Pi and Pg be two subsets of a set of atomic proposi- 
tions P as Pi = {gii, gi2, gi3}={Purpose is empty. Undefined, Defined} and 
P2 = {g2i, g22, g23, 924, g25, <?26}={feature with STM- 4 o, STM-lo, Connections, 
Protection-Switching, Synchronisation, 2 Mbit/s|. These two subsets of atomic 
propositions are equivalent to the two elementary equivalence relations, R and 
R', in Example 1. 

The following formulae: 

(f) =test cases with feature STM-lo and purpose given, 

ip =either test cases with feature Connections or with purpose undefined, 

(f =test cases with feature is neither STM-fo nor STM-lo and purpose 
known, 

which can be re-written into disjunctive normal forms: 

(p = (913 A g22), 

■f = (912) V (g 23 ), 

T = (913 A -'(gi2 V g22)) = (913 A (-■g2i A -■922)) 

are all Ri fl i?2-definable. The subsets of objects supporting these formulae, 
i.e., v{ 4 >), v{tp), and v{ip) are {cg,cg,cio|, {02,03, cg}, and {04,05} respectively. 

Valuation function val requires full information about every ordered pair 
(u, g) in the space U x P. This is an ideal situation where for every formula 
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(j) in C{P) it is possible to identify all the objects that support <f), and this 
set is v{(j)) through Eq. (4). When a universe U is very large, it may not be 
practical to require function val being fully specified, but be quite reasonable 
to have information about a particular elementary equivalence relation (i?) and 
its corresponding equivalent subset of a set of atomic propositions {P'). In this 
case, v{(j)) can be determined only when (p € L{P'), as v{4>) can be represented 
using elements in U / R. 

Still, this is an unavoidable question that one may ask: is it realistic to assume 
that the relevant equivalence relations (hence equivalence classes) are given as 
prior knowledge? The answer may be ‘No’ for many applications, however, the 
answer is ‘Yes’ for the test-case selection scenario in telecommunications, because 
the feature or sub-feature of all already designed test cases must be given. 

Definition 9. Structure {U, R, P, P' ,val) is called a partial rough logic theory 

1. U is a universe consisting of a finite number of objects, 

2. P is a finite set of atomic propositions, 

3. R is an elementary equivalence relation on U , 

j. Valuation function val is only partially specified on space U x P 

5. P' C P. For each qi G P' , v{qi) = Wi and Wi is in U/R. 

Based on a partial rough logic theory {U, R, P, P' ,val), the following equations 
hold only for formulae in C(P'). 

v{-vf>) = u\ v{4>), v{4> Alp) = v{4>) n v{ip), 

v{4> y Ip) = v{(p) U v{'ip), v{(p -A ^p) = {U \ v{(p)) U v{tp). 



Each partial rough logic theory defines the set of objects supporting a formula in 
C(P') precisely with the knowledge of relevant elementary equivalence relation 
R. That is, all formulae in C{P') are i?— definable. For ip G C{P)\C{P') which is 
not i?-definable, it is only possible to define the upper and lower approximations 
of v{(p). 

viP) = U{v{fj) I V’ h V’ G ^P')} = U{IE, C vii;) I V’ h </>, V’ G £(P')}, (8) 

v{(p) = U\v{^(p). (9) 

Eq. (8) defines the lower bound of the set of objects that make formula (p true 
and Eq. (9) gives the upper bound of that set. The algebraic properties of {v,v) 
can be found in [2] . All objects in the lower bound will definitely satisfy formula 
p while an object in the upper bound is known not to satisfy -<p, therefore, 
it may support p. In terms of Dempster-Shafer theory of evidence, if a frame 
of discernment is defined as elements being the equivalence classes of R, then 
Eq.(8) will yield a belief function and Eq.(9) will produce a plausibility function 

([ 7 ]). 
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4 Reasoning about Knowledge 



Prom general concepts (values) to specific concepts (values) or vice 
versa: An information system, exemplified by a data table, provides the basic 
information to answer relevant queries. Since each attribute in a data table is 
confined to an exclusive set of values, some intermediate values cannot always be 
explicitly shown in this table. When a query involves in an intermediate value, 
a system has to have an approach to matching it with the more specific/general 
values available in the table. This process requires additional knowledge about 
the application domain that is being dealt with. We call the tables holding the 
domain knowledge as meta-level tables, such as Table 3. 

Table 3. A meta-data table 



Feature-details 



Feature 



T25 Alarm Reporting - Unterminated Through Connections STM-4o 
T24 STM-4 Alarm Correlation - HP-REI masked by HP-RDI STM-4o 
T20 Eqpt Alarms - ALS-Dis (STM-4) STM-4o 

T19 Eqpt Alarms - Write Protect Jumper Fitted STM-4o 

T12 Plug-in Unit Alarms - Unexpected Card STM-4o 

Til Plug-in Unit Alarms - Card Out STM-4o 

TIO Loopback - Operation STM-4o 



Now we visit Example 1 again. In Table 2, one of the values of attribute 
Feature is STM-4.0. In fact, STM~4o covers wide range of test activities, such as, 
T20 Eqpt Alarms - ALS-Dis (STM-4) or TIO Loopback - Operation (see Table 
3 for more) . Therefore, it is more useful to provide these details in a data table 
than just giving STM-4o. We now replace attribute Feature with Feature- details 
and update the values of Feature- details as appropriate as shown in Table 4. For 
instance, if feature STM-4o is not replaced by a set of detailed features, it is 
then difficult to answer the following query Ql: select test cases with features 
relevant to Plug-in unit alarms. With Table 4, it is easy to answer Ql. However, 
it raises problems when queries like Q2 below are issued, Q2: select test cases 
with STM-40 related plug-in tests. 

To deal with the connections/relationships between general and specific con- 
cepts in a given domain, meta-level knowledge needs to be available. Meta-level 
tables can be used as supplements to data tables when answering queries. In 
this way, knowledge “T25 Alarm Reporting — >■ STM-4o” is stored as a record 
in a meta-level table as shown in Table 3. There are in total 14 most general 
features, hence 14 meta-level tables are required. Now, let us assume that P is 
a set of atomic propositions with <71 standing for ‘A test case has feature T25 
Alarm Reporting’, <72 for ‘A test case has feature STM-lo’, ..., , (77 for ‘A test 
case has feature Plug-in Unit Alarms ’ respectively. Let us also assume that R is 
an elementary equivalence relation which partitions test cases according to their 
features. Based on Definition 7, subset P' = {qi,q2, qv} is equivalent to R and 
U/R = v{P'). Given the knowledge about P' and R, according to Theorem 1, 
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every formula in C{P') is i?— definable. Query Q 1 above which can be re-written 
as a proposition, ipi = & test case has feature Plug-in Unit Alarms = qr, can be 
answered based on knowledge R. Similarly, query Q 2 which means ‘a test case 
has feature STM-4.0 and feature Plug-in' can be expressed as (/?2 = {qi V (77V 
qs V qgV qio V qu V qig) A (77 = (77, is also i?-definable, where gg, ..., q\2 stand for 5 
atomic propositions that a test has feature with T24, T20, T19, Til, or TIO re- 
spectively (see the details given above). Therefore test cases (objects) supporting 
it are obtained straightforwardly. However, query Q3: select test cases relevant 
to alarms (or alarm raise), is not i?-definable, since test cases c8 and clO are also 
relevant to alarms problems as shown in the column Purpose and they cannot 
be summarized into an equivalence class of R. If we use 1733 to denote query Q 3 , 
we have qi ^ ps and (77 \= where p\ ^ p2 means whenever an interpretation 
makes p\ true, it must make p2 true as well. According to Eqs. (8) and ( 9 ), 

77(1733) = U{u(g) I g 1 = 733} = v{qi)Av{qr) = U = {cl,c 7 }, 
77(733) = U\ v{-^ipz) = U \ 9 = U. 

77(733) gives us those test cases which should be definitely selected while 77(733) 
covers those test cases that might be selected. For this query, 77(793) does not pro- 
vide much useful information, since it contains all test cases. To further eliminate 
worthless test cases, we need to make use of other information in the database. 
Because values of attribute Purpose cannot be used to partition the universe due 
to its non-deterministic nature. When the details of Purpose of a test case are 
given, it usually contains several possible outcomes of a test, each of which may 
in turn consist of several symptoms simultaneously. To model this phenomena, 
we apply set-based computations discussed in Section 2 . 2 . 



Table 4. A set based sample test case data table 



U 



Purpose- key- word 



cl 

c2 

c3 

c4 

c5 

c6 

c7 

c8 

c9 

clO 



Undefined 

Undefined 

{{Forced Path Protection Switch, success, Standby Path faulty}, {Path 
Protection, Standby Path, occur}, {Path Protection, Standby Path, not occur}} 
{{Stm-N/ESI ports, Setg priority list. Ports not logically equipped, not added}} 
Undefined 

{{Alarm Severity, Card Out, Default value. Severity, Minor}, 

{Severity change. Alarm raise}} 

{{Card type. Card variant. Unique Serial No}, {Alarm raise}} 

{{HP-UNEQ, Alarm raised, C2=00 5 times}, {Alarm not raised, C2 set 00}} 



Refining upper bounds using set-based computations: Equipped with 
Definition 5 and Eqs. in ( 3 ), we revise Table 2 Purpose to obtain Table 4 Purpose- 
key-word (we only include this attribute in Table 4 ) . It is worth pointing out that 
when a test case has multiple values for attribute Purpose, each value is a pos- 
sible outcome of that test case and the value cannot be decided until the test 
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case is used in a specific test. In addition for each possible outcome, a set of 
joint descriptions is possible. In this situation, those descriptions should be read 
conjunctively. For example, value { Can display card type; Card variant; Unique 
serial No} means a user ‘can read card type and card variant, and unique serial 
number’. In order to process the sentence descriptions in column Purpose more 
efficiently, we have identified a set of key-words used in all possibly purpose spec- 
ifications. The sentence descriptions of Purpose of a test case are thus replaced 
by the combinations of these key-words, as shown in Table 4. Therefore, each 
possible outcome identified in column Purpose-key-word can be treated as a set 
of values^. This enables set-based computations applicable. 

Let Vpurp be the set of all key- word collections used for describing Purpose, 
and let p be a key-word appeared in a given query Q, logically expressed as 
formula q, then interval set B = [{p}, {p}] = {{p}} C \ 0 is called a 

base interval set. Further more, let Ui be a test case in v{q), and let Ai = 
Purpose — key — word{ui) be the set of subsets of purpose key- word collections of 
Ui- Then sets v{q)t and v{q)* defined by the following two equations are referred 
to as the tighter upper bound and the looser upper bound of v{q) respectively, 

u(p)* = Rett.{Qs)Cv{q) = {ui \ purpose— k—w{ui) = Ai,Ai 3* B}Cv{q), (10) 

v{q)* = Ret* {Qs)Cv{q) = {ui \ purpose— k—w{ui) = Ai,Ai 3* B}\Jv{q). (11) 

It is observed that the purpose of a test case may not be defined and it is 
also possible that the purpose of a test case in v{q) can either not be defined 
or not contain key-word p. Therefore, we will have to union v{q) to the selected 
set. Also, subscript B oi Q can be omitted is there is no confusion about which 
base interval set we refer to. 

When a query Q involves several key-words and generates multiple base in- 
terval sets, Bi,...,Bj, Eqs. (10) and (11) will be repeatedly applied to all base 
interval sets. As for any two base interval sets Bi and Bj defined from two dis- 
tinct key-words, the effect of conjunction or disjunction of the key-words in a 
query will be reflected by the computation of joint tighter/looser bounds using 
the following equations: 

v{qBiandBj)* = v{qBi)* C\v{qB,j)*, v{qBiandBj)* = ^idBi)* Cv{qB,i)*; 
v{qBiorBj)* = v{qBi)*Cv{qBj)t., v{qBiorBj)* = v{qBi)* Cv{qBj)* ■ 

Now looking back at query Qs) if we assume B = [{Alarm}, 
|Alarm}]={{Alarm}}, and apply Eqs. (10) and (11) to v{<P 3 ), we get F(p 3 )* = 
|c 8 ,cio} U |ci,C 7 } and F(p 3 )* = {c 8 ,C 9 ,cio| U { 01 , 07 }. 

5 Conclusion 

In this paper, we have presented novel approaches to coping with two common 
problems usually involved in a query: general concepts that are not explicitly 

^ In fact, we rename the existing attribute Purpose as Purpose-description and add 
an additional attribute Purpose-key-word. In this way, we will be able to look at the 
detailed descriptions of test case purposes for those selected test cases. 
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defined in a data table and non-determinictic values among a set of possible 
choices. A logical based method is used to deal with the former while set based 
computations are applied to the latter. 

The method in [5] is not applicable in test case selection problem, since 
there is a large number of attributes (24) involved in test case data table with 
thousands of records (test cases) . It is not practical to re-generate the whole test 
case table every time a query is issued. The approach in [1] is also inadequate 
for this specific application because there are no user defined bounds available 
to generate possible predicates. However, our mechanism is very similar to the 
knowledge representation schema in [12], where a tree is used to represent all the 
possible values of an attribute at different levels. Instead of using trees, we use 
meta-level tables to do the same job. Each meta-level table, equivalent to a tree 
in [12], can have more than two columns, with the most specific values in the 
far-left column and the most general values at the far-right. The manipulation 
and maintenance of these tables are almost identical to any data table in an 
information system, so there is very little extra work involved in building these 
meta-level tables. 
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Abstract. We put forward a model for transmission channels and chan- 
nel coding which is possibilistic rather than probabilistic. We define a 
notion of possibilistic capacity, which is connected to a combinatorial 
notion called graph capacity. In the probabilistic case graph capacity is 
a relevant quantity only when the allowed decoding error probability is 
strictly equal to zero, while in the possibilistic case it is a relevant quan- 
tity for whatever value of the allowed decoding error possibility; as the 
allowed error possibility becomes larger the possibilistic capacity stepwise 
increases (one can reliably transmit data at a higher rate). We discuss 
an application, in which possibilities are used to cope with uncertainty 
as caused by a “vague” linguistic description of channel noise. 



1 Introduction 

The coding -theoretic approach to information measures was first taken by Shan- 
non when he laid down the foundations of information theory in his seminal 
paper of 1948 [14], and has proved to be quite successful; it has lead to such 
important probabilistic functionals as source entropy or channel capacity. Below 
we shall adopt a model for transmission channels which is possibilistic rather 
than probabilistic; this will lead us to define a notion of possibilistic capacity in 
much the same way as one arrives at the corresponding probabilistic notion. We 
are confident that our coding-theoretic approach may be a contribution to en- 
lighten, if not to disentangle, the vexed question of defining adequate information 
measures in possibility theory (non probabilistic, or “unorthodox”, information 
measures are covered, e.g., in [11] or [12]). In [16] a general theory of possibilis- 
tic data transmission is put forward; both source coding and channel coding are 
covered; beside possibilistic capacity, in [16] also a notion of possibilistic entropy 
is defined; an interpretation of possibilistic coding is discussed, which is based 
on distortion measures as currently used in probabilistic source coding. 

We recall that the capacity of a probabilistic channel is an asymptotic param- 
eter; more precisely, it is the limit value for the rates of optimal codes, used to 
protect information from channel noise; the codes one considers are constrained 
to satisfy a reliability criterion of the type: the decoding error probability of the 
code should be at most equal to a tolerated value e, 0 < e < 1. A streamlined de- 
scription of channel codes will be given below in Section 4; even from our fleeting 
hints it is however apparent that, at least a priori, the capacity of a channel de- 
pends on the value e which has been chosen to specify the reliability criterion. If 
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in the probabilistic models the mention of e is usually omitted, the reason is that 
the asymptotic value for the optimal rates is the same whatever the value of e, 
provided however that e is strictly positive. A zero-error reliability criterion leads 
instead to quite a different quantity, called zero-error capacity. The zero-error 
problem of data protection in noisy channels is exceedingly difficult, and has lead 
to a new and fascinating branch of information theory and combinatorics, called 
zero-error information theory, which has been pretty recently overviewed in [13]. 
In particular, the zero-error capacity of a probabilistic channel is expressed in 
terms of a remarkable combinatorial notion called Shannon’s graph capacity. 

So, even in the case of probabilistic capacity one deals with a step function 
of e, which assumes only two distinct values, one for e = 0 and the other for 
e > 0. We shall adopt a model of the channel which is possibilistic rather than 
probabilistic, and shall choose a reliability criterion of the type: the decoding 
error possibility should be at most equal to e, 0 < e < 1. As shown below, the 
possibilistic analogue of capacity exhibits quite a perspicuous stepwise behaviour 
as a function of e, and so the mention of e cannot be disposed of. As for the 
“form” of the functional one obtains, it is of the same type as in the case of the 
zero-error probabilistic case, even when the tolerated error possibility is strictly 
positive: the capacities of possibilistic channels are always expressed in terms of 
graph capacities. In the possibilistic case, however, as one loosens the reliability 
criterion by allowing a larger error possibility, the relevant graph changes and 
the capacity of the possibilistic channel increases. 

We describe the contents of the paper. In Section 2, after some preliminaries 
on possibility theory, possibilistic channels are introduced. Section 3 contains 
simple lemmas, which are handy tools apt to “translate” probabilistic zero-error 
results into the framework of possibility theory. In Section 4, after giving a 
streamlined description of channel coding, possibilistic capacity is defined and a 
coding theorem is provided. Up to Section 4, our point of view is rather abstract: 
the goal is simply to understand what happens when one replaces probabilities 
by possibilities in the current models of data transmission. A discussion of the 
practical meaning of our proposal is instead deferred to Section 5; possibilities 
are seen as numeric counterparts for “vague” linguistic judgements. 

In this paper we take the asymptotic point of view which is typical of Shannon 
theory, but one might prefer to take the constructive point of view of algebraic 
coding: as a first step in this direction, in [1] and [9] a possibilistic decoding 
strategy has been examined which is derived from minimum Hamming distance 
decoding. We deem that the need for a solid theoretical foundation of “soft” 
coding, as possibilistic coding basically is, is proved by the fact that several 
ad hoc coding algorithms are already successfully used in practice, which are 
not based on probabilistic descriptions of the source or of the channel; such 
descriptions, which are derived from statistical estimates, are often too costly to 
obtain, or even unfeasible, and at the same time they are uselessly detailed. 

The paper aims at a minimum level of self-containment, and so we have 
shortly redescribed certain notions of information theory which are quite stan- 
dard; for more details we refer the reader, e.g., to [3] or [4]. As for possibility the- 
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ory, and in particular for a clarification of the elusive notion of non-interactivity, 
which is often seen as the natural possibilistic analogue of probabilistic indepen- 
dence (cf Section 2), we mention [5], [6], [8], [10], [11], [17]. 

2 Possibilistic Channels 

We recall that a possibility distribution U over a finite set A = {ai, . . . ,Ok}, 
called the alphabet, is defined by giving a possibility vector II = (tti, 7T2, . . . , tt^) 
whose components tt^ are the possibilities II (at) of the k singletons (1 < i < fc, 
k > 2); the possibility^ of each subset A C ^ is the maximum of the possibilities 
of its elements: 



n{ai) = Tfi, 0 < TTi < 1, max 7Tj = 1, II{A) = maxTTi (2-1) 

l<2<fc ai^A 

In particular 7T(0) = 0, 11(A) = 1. In logical terms taking a maximum means 
that event A is e-possible when at least one of its elements is so, in the sense of 
a logical disjunction. 

Instead, probability distributions are defined through a probability vector 
P = (pijP 2 , ■ ■ ■ 1 Pk), and have an additive nature, rather than a maxitive one. 
With respect to probabilities, an empirical interpretation of possibilities is less 
clear. The debate on the meaning and the use of possibilities is an ample and 
long-standing one; the reader is referred to standard texts on possibility theory, 
e.g., those quoted at the end of Section 1; cf also Section 5, where the applicability 
of our model to real-world data transmission is discussed. 

Let A = {oi, . . . , Ofe} and B = {bi, . . . , bh} be two alphabets, called in this 
context the input alphabet and the output alphabet, respectively. Probabilistic 
channels are usually described by giving a stochastic matrix W whose rows are 
headed to the input alphabet A and whose columns are headed to the output 
alphabet B. The k rows of such a stochastic matrix are probability vectors over 
the output alphabet B; each entry W(b\a) is interpreted as the transition prob- 
ability from the input letter a € A to the output letter b G B. A stationary and 
memoryless channel IT", or SML channel, extends W to n-tuples, and is defined 
by setting for each x = X\X 2 • . • G A^ and each y = yiP 2 ■ . .yn & B^: 

n 

= W"‘(yiy2 ■ ■ ■yn\xiX2 ■ ■ -Xn) = Ww(yi\xi) (2.2) 

i=l 

Note that IT" is itself a stochastic matrix whose rows are headed to the sequences 
in A'^ , and whose columns are headed to the sequences in ,B". The memory less 
nature of the channel is expressed by the fact that the n transition probabilities 
W(yi\xi) are multiplied. 

We now define the possibilistic analogue of stochastic (probabilistic) matrices. 
The k rows of a possibilistic matrix 1/ with h columns are possibility vectors over 



^ The fact that the same symbol is used both for vectors and for distributions will cause no confu- 
sion; similar conventions will be tacitly adopted also in the case of matrices and channels. 
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the output alphabet B. Each entry 'F{b\a) will be interpreted as the transition 
possibility^ from the input letter Aio the output letter b G B; cf the example 
given below. In definition 2.1 W is such a possibilistic matrix. 

Definition 2.1. A stationary and non-interactive channel, or SNI channel, 
extends 'f' to n-tuples and is defined as follows: 

• • -yn|a:iX 2 . . .Xn) = min (2.3) 

~ l<i<n 

Products as in (2.2) are replaced by a minimum operation; this expresses the 
fact that the extension is non-interactive. Note that is itself a possibilistic 
matrix whose rows are headed to the sequences in .A", and whose columns are 
headed to the sequences in S". Taking the minimum of the n transition possi- 
bilities ’I'{yi\xi) may be interpreted as a logical conjunction: only when all the 
transitions are e-possible, it is e-possible to obtain output y from input x. We 
deem that Section 5 will vindicate the adequateness of the SNI model in situa- 
tions of practical interest. If i? is a subset of B^, one has in accordance with the 
last equality of (2.1): 

tfe[”l(B|x) = max!f[”l(2/|x) 

y&B - 

Example 2.1. For A = B = {a, 6} we show a possibilistic matrix ^ and its 
“square” which specifies the transition possibilities from input couples to 
output couples. The possibility that a is received when b is sent is <5; this is also 
the possibility that aa is received when ab is sent, say ; 0 < 5 < 1. 
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3 A Few Lemmas 

Sometimes the actual value of a probability does not matter, what matters is 
only whether that probability is zero or non-zero, i.e., whether the corresponding 
event E is “impossible” or “possible” . The canonical transformation maps prob- 
abilities to binary (zero-one) possibilities by setting Poss{E} = 0 if and only if 

^ Of course transition probabilities and transition possibilities are conditional probabilities and 
conditional possibilities, respectively, as made clear by our notation which uses a conditioning 
bar. We have avoided mentioning explicitly the notion of conditional possibilities because they 
are the object of a debate which is far from being closed (cf, e.g., Part II of [5]); actually, the 
worst problems are met when one starts by assigning a joint distribution and wants to compute 
the marginal and conditional ones. In our case it is instead conditional possibilities that are the 
starting point: as argued in [2], “prior” conditional possibilities are not problematic, or rather 
they are no more problematic than possibilities in themselves. 




402 A. Sgarro 



Prob{i?} = 0, else Possjil^} = 1; this transformation can be applied to the com- 
ponents of a probability vector P or to the components of a stochastic matrix W 
to obtain a possibility vector 7T or a possibilistic matrix P, respectively. Below 
we shall introduce a more general notion called e- equivalence. It will appear that 
a matrix 'P obtained canonically from W is e-equivalent to W for whatever value 
of e (here and in the sequel e is a real number such as 0 < e < 1). 

Definition 3.1. A stochastic matrix W and a possibilistic matrix P are said to 
be e-equivalent when the following double implication holds Va € .4, V& G B: 

W{b\a) = 0 4=^ P{b\a) < e 

However simple, the following lemma 3.1 is the basic tool used to convert 
probabilistic zero-error results into possibilistic ones (the straightforward proofs 
of the two lemmas below are omitted) . 

Lemma 3.1. Fix n > 1. The stochastic matrix W and the possibilistic matrix P 
are e-equivalent if and only if the following double implication holds Vx G .4", 
VB C 

lP”(H|x) = 0 p'^^\b\x) < e 

In Sections 4 and 5 on channel coding we shall need the following notion of 
confoundability between letters: two input letters a and a' are confoundable for 
the probabilistic matrix W if and only if there exists at least an output letter 
b such that the transition probabilities W{b\a) and W{b\a') are both strictly 
positive. Given matrix W, one can construct a confoundability graph G{W) , 
whose vertices are the letters of A, by joining two letters by an edge if and only 
if they are confoundable. 

We now define a similar notion for possibilistic matrices. To this end we first 
introduce a proximity index between any two input letters a and a': 

a^{a,a') = Tn&^\p{b\a) /\P{b\a)\ 

Above the wedge symbol A stands for a minimum and is used only to improve 
readability. We observe that a,p{a,a') is a proximity relation in the technical 
sense of fuzzy set theory. 

Example 3.1. We re-take example 2.1 above. One has: a,p{a, a) = a,p{b, b) = 1, 
a^{a, b) = (5. With respect to the proximity of two letter couples x and xf 
is either 1 or 6, according whether x = xf or x ^ xf (recall that can be 
viewed as a possibilistic matrix over the “alphabet” of letter couples). Cf also 
example 4.1 and the application worked out in Section 5. 

Definition 3.2. Once a possibilistic matrix P and a number e are given (0 < 
e < 1), two input letters a and a' are defined to be e -confoundable if and only if 
their proximity exceeds e: 



aip{a,a') > e 
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Given W and e, one constructs the e-confoundability graph Ge(<?'), whose ver- 
tices are the letters of A, by joining two letters by an edge if and only if they are 
e-confoundable for S'. If bb" and S'!"! are seen as matrices with fc” rows headed 
to bl" and h” columns headed to B", one can consider also the confoundability 
graphs G(bb”) and G(S'["1) for the fc" input sequences of length n: as one soon 
checks, two “vertices” (two input sequences) x = X\X2 ■ ■ - Xn and u = U\U2 ■ ■ - Un 
are joined by an edge if and only if for each component i either Xi = Ui, or 
Xi and Ui are adjacent; 1 < i < n. Observe that this sort of extension to a 
“power-graph” G" on vertex sequences of length n can be performed starting 
from any simple graph G, i.e., from any graph without loops and without multi- 
ple edges; G” is called the strong power of G. If one uses strong powers, one can 
indifferently write G(Ib”) or (G(Ib))”, Ge('f'["l) or (Ge('f'))", respectively. 

Lemma 3.2. If the stochastic matrix W and the possibilistic matrix I' are 
e-equivalent the two confoundability graphs G(bb”) and G£(<f'["l) coincide for 
each length n > 0. 



We still need a combinatorial notion; cf [4] and [13]. Take any simple graph 
G and let 6(G") be the independence number^ of the strong-power graph G”. 

Definition 3.3. The limit of n“^log i(G") when n goes to infinity is called 
the graph capacity G(G) of the graph G. 



The minimum value of Shannon’s graph capacity, as it is also called, is zero: 
just take a complete graph. The maximum value of the capacity of a graph with 
k vertices is logfc: just take a graph without edges. It is rather easy to prove 
that 

log 6 (G) < n-Mog 6 (G") < logx(G) (3.1) 

and so whenever r(G) = x(G) the graph capacity is very simply C(G) = 
log 6 (G); here x(G) is the chromatic number of the complementary graph G, 
whose edges are exactly those which are lacking in G. Unfortunately, a single- 
letter expression of graph capacity is so far unknown, at least in general (“single- 
letter” means that one is able to calculate explicitly the limit so as to get rid 
of the length n). E.g., let us take the case of a polygon P^, with k vertices. For 
fc = 3, we have a triangle P 3 ; then 6 (Pa) = x(Ps) = 1 and the capacity G(Ps) is 
zero. Let us go to the quadrangle P 4 ; then 6 (P 4 ) = x(P 4 ) = 2 and so G(P 4 ) = 1. 
In the case of the pentagon, however, 6 (Ps) = 2 < x(P 5 ) = 3. It was quite an 
achievement of Lovasz to prove in 1979 that ^(Ps) = log -\/5, as conjectured for 
more than twenty years. The capacity of the heptagon P 7 is still unknown. 

^ We recall that an independent set in a graph, called also a stable set, is a set of vertices no tvvo 
of which are adjacent. The size of a maximal independent set is called the independence number 
of the graph. 
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4 The Capacity of a Possibilistic Channel 

We start by the following general observation. The elements which define a 
code /, i.e., the encoder /+ and the decoder f~ (cf below), do not require 
a probabilistic or a possibilistic description of the channel. One must simply 
choose the input alphabet A and the output alphabet B; one must also specify 
a length n, which is the length of the codewords which are sent through the 
channel. Once these elements, A, B and n, have been chosen, one can construct 
a code /, i.e., a couple encoder /decoder. Then one can study the performance of 
/ by varying the “behaviour” of the channel: for example one can first assume 
that this behaviour has a probabilistic nature, while later one changes to a less 
committal possibilistic description. 

We give a streamlined description of what a channel code is; for more details 
we refer to [3], [4], and also to [13], which is specifically devoted to zero-error 
information theory. The basic elements of a code / are the encoder f~^ and the 
decoder f~. The encoder f~^ is an injective (invertible) mapping which takes 
uncoded messages onto a set of codewords C C A^] the set M of uncoded mes- 
sages is left unspecified, since its “structure” is irrelevant. Codewords are sent 
as input sequences through a noisy medium, or noisy channel. They are received 
at the other end of the channel as output sequences which belong to B^. The 
decoder f~ takes back output sequences to the codewords of C, and so to the 
corresponding uncoded messages. This gives rise to a partition of S” into decod- 
ing sets, one for each codeword c G C. Namely, the decoding set T>c for codeword 
cisVc = {y: f~{y) = c} C B^. 

The most important feature of a code / = (/+, /“) is its codebook C C A^ of 
size jCj. The decoder /“, and so the decoding sets T>c, are often chosen by use of 
some decision-theoretic principle, but we shall not need any special assumption. 
The encoder /+ will never be used in the sequel, and so its specification is 
irrelevant. The rate of a code / with codebook C is defined as 

Rn = n~^ log|C| 

The number logjCj can be seen as the (not necessarily integer^) binary length 
of the uncoded messages, the ones which carry information; then the rate is 
interpreted as a transmission speed, which is measured in information bits (bit 
fractions, rather) per transmitted bit. The idea is to design codes which are fast 
and reliable at the same time. Once a reliability criterion has been chosen, one 
tries to find the optimal code for each pre-assigned codeword length n, i.e., a 
code with highest rate among those which meet the criterion. 

Let us consider a stationary and memoryless channel W", or SML channel, 
as defined in (2.2). To declare a code / reliable, one requires that the probability 

^ In Shannon theory one often incurs into the slight but convenient inaccuracy of allowing non- 
integer “lengths”. By the way, the logarithms here and below are all to the base 2, and so the 
unit we choose for information measures is the bit. Bars denote size, i.e., number of elements. 
Notice that, not to overcharge our notation, the mention of the length is not made explicit in the 
symbols which denote coding functions and codebooks. 
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that the output sequence does not belong to the correct decoding set is accept- 
ably low, i.e., below a pre-assigned threshold e, 0 < e < 1. If one wants to play 
safe, one has to insist that the decoding error should be low for each codeword 
cGC which might have been transmitted. The reliability criterion which a code 
/ must meet is so: 

max IT"(-'2?c|c) < e (4.1) 

The symbol -■ denotes negation, or set-complementation; of course the inequality 
sign in (4.1) can be replaced by an equality sign whenever e = 0. Once the 
length n and the threshold e are chosen, one can try to determine the rate 
Rn = Rn{W,e) of an optimal code which solves the optimization problem: 

Maximize the code rate Rn so as to satisfy the constraint (4.1) 

The job can be quite tough, however, and so one has often to be contented 
with the asymptotic value of the optimal rates i?„, which is obtained when the 
codeword length n goes to infinity. This asymptotic value is called the e- capacity 
of channel W . For 0 < e < 1 the capacity Cg is always the same, only the speed 
of convergence of the optimal rates to is affected by the choice of e. When 
one says “capacity” one refers by default to the positive e case®; cf [3] or [4]. 

Instead, when e = 0 there is a dramatic change. In this case one uses the 
confoundability graph G{W) associated with channel W; cf Section 3. As easily 
checked, for e = 0 the codebook C C A" of an optimal code is precisely a maxi- 
mal independent set of G(kF"). Consequently, the zero-error capacity Co(lF) of 
channel W is equal to the capacity of the corresponding confoundability graph 
G{W), as defined at the end of Section 3: 

Co(W) = C(G{W)) (4.2) 

The paper [15] which Shannon published in 1956 and which contains these results 
inaugurated zero-error information theory. Observe however that the equality 
(4.2) gives no real solution for the problem of assessing the zero-error capacity 
of the channel, but simply re-phrases it in a neat combinatorial language; recall 
that a single-letter expression of graph capacity is so far unknown, at least in 
general. 

We now pass to a stationary and non-interactive channel or SNI channel, 
as defined in (2.3). The reliability criterion (4.1) is correspondingly replaced by: 

ma^ (-i2?c|c) <e (4.3) 



The optimization problem is now: 

® The capacity relative to a positive error probability allows one to construct sequences of codes 
whose probability of a decoding error is actually infinitesimal; this point of view does not make 
much sense for possibilistic models, which are intrinsically “discrete” . 
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Maximize the code rate Rn so as to satisfy the constraint (4.3) 

The number e is now the error possibility which we are ready to accept. Again 
the inequality sign in (4.3) is to be replaced by the equality sign when e = 0. 

Definition 4.1. The e-capacity of channel T is the limit of optimal code rates 
Rn{T,e), obtained as the codeword length n goes to infinity. 

The following lemma is soon obtained from lemma 3.1, and in its turn soon 
implies theorem 4.1 (use also lemma 3.2); it states that possibilistic coding and 
zero-error probabilistic coding are different formulations of the same mathemat- 
ical problem. 

Lemma 4.1. Let the SML channel W and the SNI channel T be e-equivalent. 
Then a code f = (f~^,f~) satisfies the reliability criterion (4.1) at zero error 
for the probabilistic channel W if and only if it satisfies the reliability criterion 
(4.3) at e- error for the possibilistic channel T. 

Theorem 4.1. The codebook C C yl" of an optimal code for criterion (4.3) is a 
maximal independent set o/ Ge('f'I"l). Consequently, the e-capacity of the possi- 
bilistic channel T is equal to the capacity of the corresponding e-confoundability 
graph Ge('f') .• 

C,(f) = C(G,(<f)) 

As for the decoding sets T>c of an optimal code, their specification is straight- 
forward: one decodes y to the unique codeword c for which T^^\y\c) > e; if 
tf/[”l(y|c) < e for all c G C, then y can be assigned to any decoding set, this 
choice being irrelevant from the point of view of criterion (4.3). Below we stress 
explicitly the obvious fact that the graph capacity C,.{'T) is a stepwise non- 
decreasing function of e, 0 < e < 1; the term “consecutive” refers to an ordering 
of the distinct components tti which appear in W (wi can be zero even if zero 
does not appear as an entry in T): 

Proposition 4.1. // 0 < e < e' < 1, then Ge(<?') < ('?'). If tt^ < -Ki+i are two 

consecutive entries in W, then is constant for < e < 

Example 4.1.' a “rotating” channel. Take k = 5; the quinary input and out- 
put alphabet is the same; the possibilistic matrix W “rotates” the row-vector 
(1, (5, T, 0, 0) in which 0 < r < 5 < 1: 
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After setting by circularity og = oi, 07 = 02, one has: a{ai,ai) = 1 > 

(j{ai,ai+i) = S > a{ai,ai+ 2 ) = t, 1 < i < 5. Capacities can be computed 
as explained at the end of Section 3: C'o(’f') = 0 (the corresponding graph is 
complete), Cr{^) = log-\/5 (the pentagon graph pops up), Cs{^) = log 5 (the 
corresponding graph is edge-free). So = 0 for 0 < e < r, Cf{^) = log -\/5 

for T < e < 5, else = log 5. 

Remark 4.1. In the probabilistic case a constraint of the type Prob{-'C'} < e 
can be re-written in terms of the probability of correct decoding as Prob{C} > 
1 — e, because Prob{C}-|-Prob{-'C'} = 1 . Instead, the sum Poss{C}-|-Poss{-'C} 
can be strictly larger than 1, and so Poss{C} > 1 — e is a different constraint. 
This constraint, however, would be quite loose and quite uninteresting; actually, 
the possibility Poss{C} of correct decoding and the error possibility Possl-iC} 
can be both equal to 1 at the same time. 

Remark 4.2. Theorem 4.1 has been solved by simply re-cycling a result already 
available in the probabilistic framework; in [16] we show that the introduction of 
possibility values which are intermediate between zero and one does enlarge® the 
probabilistic framework of zero-error coding. Namely, in [16] we solve a problem 
which is not encountered in the probabilistic theory, by proving that the capac- 
ity does not change when one relaxes criterion (4.3), and one uses the average 
possibility of error rather than the maximal one. We expect that more significant 
novelties will be obtained by the consideration of meaningful interactive models, 
i.e., by an “aggregation” of single transition possibilities different from (2.3). 



5 An Application of the Possibilistic Model 

We have examined a possibilistic model of data transmission and coding which 
is inspired by the standard probabilistic model: what we did is simply replacing 
probabilities by possibilities and independence by non-interactivity, a notion 
which is often seen as the “right” analogue of probabilistic independence in 
possibility theory. In this section we shall discuss an application. The reader is 
referred to [16] for a systematic interpretation of our possibilistic model of data 
transmission, which is based on distortion measures] here we shall only deal with 
a rather ad hoc example. Think of the keys in a digital keyboard, as the one of the 
author’s telephone, say, in which digits from 1 to 9 are arranged on a 3 x 3 grid, 
left to right, top row to bottom row, while digit 0 is positioned below digit 8. It 
may happen that, when a telephone number is digited, the wrong key is pressed. 
We assume the following model of the “noisy channel” , in which possibilities are 
seen as numeric labels for “uncertain linguistic judgements” (only the ordering 
of the labels counts, not the actual numeric values) : 

® The new possibilistic frame includes the traditional zero-error probabilistic frame: it is enough to 
take possibilities which are equal to zero when the probability is zero, and equal to one when the 
probability is positive, whatever its value. 
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1. it is quite plausible that the correct key is pressed (possibility 1) 

2. it is less plausible that one inadvertently presses a “neighbour” of the correct 
key, i.e., a key which is positioned on the same row or on the same column 
and is contiguous to the correct key (possibility 1/2) 

3. everything else is quite unplausible (possibility 0) 

Using these values^ one can construct a possibilistic matrix with the input and 
the output alphabet both equal to the set of the ten keys. As for the proximity 
a^{a, b), it is 1/2 whenever either keys a and b are neighbours, or there is a third 
key c which is a common neighbour of both. One has, for example: <f"(a|l) = 1/2 
for a G {2,4}, = 1/2 for a G (2, 3, 4, 5, 7}. When the wrong key is 

pressed, we shall say that a cross-over of type 2 or of type 3 has taken place, 
according whether its possibility is 1/2 or 0. A codebook is a bunch of admissible 
telephone numbers of length n; since a phone number is wrong whenever there 
is a collision with another phone number in a single digit, it is natural to assume 
that the “noisy channel” W is non-interactive. If the allowed error possibility of 
the code is as large as 1/2 or more, the confoundability graph is edge-free and 
no error protection is required. If instead the error possibility is 0, the output 
sequence y is decoded to the single codeword c for which S'[”l(y|c) > 0; so, error 
correction is certainly successful if there has been no cross-over of type 3. This 
example was suggested to us by J. Korner; however, at least in principle, in 
the standard probabilistic setting one would have to specify a stochastic matrix 
W such as to be 0-equivalent with S'. In W only the opposition zero/non-zero 
would count; unfortunately, the empirical meaning of W is not at all clear, and 
has nothing to do with the actual probabilities with which errors are committed 
by the hand of the operator; these probabilities would give one more stochastic 
matrix W' yf W . So, the adoption of a “hard” probabilistic model is in this case 
pretty unnatural. Instead, in a “soft” possibilistic approach one specifies just one 
possibilistic matrix S', which contains precisely the information which is needed 
and nothing more. 

Unfortunately, the author’s telephone is not especially promising. In this 
case one has Co(’f') = logi(Go(!f’)) = log 3; in other words the 0-capacity, 
which is an asymptotic parameter, is reached already for n = 1. To see this 
use inequalities (3.1). The independence number of Go(?f') is 3, and a maximal 
independent set of keys, which are far enough from each other so as not to be 
confoundable, is (0, 1, 6}, as easily checked; however, one checks that 3 is also 
the chromatic number of the complementary graph. In practice, this means that 
an optimal codebook as in Theorem 4.1 may be constructed by juxtaposition of 
the input “letters” 0, 1,6. If, for example, one digits number 2244 rather than 
1111 a successful error correction takes place; actually, •f'W(2244|c) > 0 only 
for c = 1111. If instead one is so clumsy as to digit the “quite unplausible” 
number 2225, this is incorrectly decoded to 1116. The code is disappointing, 
since everything boils down to allowing only phone numbers which use keys 

We might have chosen a “negligible” positive value, rather than 0: this vrould have made no 
difference, save adding an equally negligible initial interval vrhere the channel capacity would 
have been zero. 
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0, 1,6. The design of convenient keyboards such that their possibilistic capacity 
is not obtained already for n = 1 is a graph-theoretic problem which may be of 
relevant practical interest in those situations when digiting an incorrect number 
may cause serious inconveniences. 
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Abstract. New semantics for numerical values given to possibility mea- 
sures are provided. For epistemic possibilities, the new approach is based 
on the semantics of the transferable belief model, itself based on betting 
odds. It is shown that the least informative among the belief structures 
that are compatible with prescribed betting rates is a possibility mea- 
sure. It is also proved that the idempotent conjunctive combination of 
two possibility measures corresponds to the hyper-cautious conjunctive 
combination of the belief functions induced by the possibility measures. 
For objective possibility degrees, the semantics is based on the most in- 
formative possibilistic approximation of a probability measure. We show 
how the idempotent disjunctive combination of possibility functions is 
related to the convex mixture of probability distributions. 



1 Introduction 

Quantitative possibility theory has been proposed as a numerical model which 
could represent quantified uncertainty [32, 9, 3]. In order to sustain this claim, it 
is necessary to examine the representation power of possibility theory regarding 
uncertainty in both objective and subjective contexts. In the objective context, 
quantitative possibility can be devised as an approximation of upper and lower 
frequentist probabilities, due to the presence of incomplete statistical observa- 
tions [6, 15]. In the subjective context, quantitative possibility theory somehow 
competes with the probabilistic model in its personalistic or Bayesian views and 
with the transferable belief model (TBM) [28,24,25], both of which also intend 
to represent degrees of belief. A major issue when developing formal models 
that represent psychological quantities (belief is such an object) is to produce 
an operational definition of what these degrees are supposed to quantify. Such 
an operational definition, and the assessment methods that can be derived from 
it, provide a meaning, a semantics, to the .7 encountered in statements like ‘my 

** This work was partially realized while the last author was Visiting Professor at IRIT, 
Universite Paul Sabatier, Toulouse, France. 
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degree of belief is .7’. Such an operational definition has been produced long 
ago by the Bayesians. They claim that any state of incomplete knowledge of an 
agent can be modeled by a single probability distribution on the appropriate 
referential, and that the probabilities can be revealed by a betting experiment in 
which the agent provides betting odds under an exchangeable bet assumption. 
A similar setting exists for imprecise probabilities [29], relaxing the assumption 
of exchangeable bets, and more recently for the TBM as well [28, 27] , introduc- 
ing several betting frames corresponding to various partitions of the referential. 
In that sense, the numerical values encountered in these three models are well 
defined. 

Quantitative possibility theory (QPT) did not have such a wealth of opera- 
tional definitions so far, despite an early proposal by Giles [17] in the setting of 
upper and lower probabilities, recently taken over by De Cooman and colleagues 
[30,1]. One way to avoid the measurement problem is to develop a qualitative 
epistemic possibility theory where only ordering relations are used [11]. 

Nevertheless QPT seems to be a theory worth exploring as well, and rejecting 
it because of the current lack of convincing semantics would be unfortunate. The 
recent revival of a form of subjectivist QPT due to De Cooman and colleagues, 
and the development of possibilistic networks based on incomplete statistical 
data [16] suggests on the contrary that it is fruitful to investigate various op- 
erational semantics for possibility theory. This is due to several reasons: first 
possibility theory is a special case of most existing non additive uncertainty the- 
ories, be they numerical or not. Hence progress in one of these theories usually 
has impact in possibility theory. Another major reason is that possibility theory 
is very simple, certainly the simplest competitor for probability theory. Hence it 
can be used as useful approximate representation by other theories. A last reason 
is that previous works have suggested strong links between possibility theory and 
non-Bayesian statistics, especially the use of likelihood functions without prior 
[22], and confidence intervals. It is not absurd to think that, in the future, pos- 
sibility theory may contribute to unify and shed some light on some aspects of 
non-Bayesian statistics. 

The aim of this paper is to propose two new semantics for possibility theory: 
a subjectivist semantics and an objectivist one. We use the term ‘subjectivist’ 
to mean that we consider the concepts of beliefs (how much we believe) and 
betting behaviors (how much would we pay to enter into a game) without re- 
gard to the possible random nature and repeatability of the events. We use the 
term ‘objectivist’ to mean that we consider data generated by random processes 
where repetition is natural, and where histograms can summarize the data. The 
distinction is somehow similar to the one made between the personal and the 
frequential interpretations of probabilities. It also reflects that in the ‘subjec- 
tivist’ case, we start from a betting behavior, whereas in the ‘objectivist’ case 
we start from a histogram. 

The subjective semantics differ from the upper and lower probabilistic setting 
of the subjective possibility proposed by Giles and followers, without question- 
ing its merit. Instead of making the bets non-exchangeable, we assume that the 
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exchangeable betting rates only imperfectly reflect the agent’s beliefs. The ob- 
jectivist semantics suggests a flexible extension of particular confidence intervals. 

Moreover we show that the basic combination rules in possibility theory, 
minimum and maximum, can be interpreted in the proposed settings: the former 
using a minimal commitment assumption in the subjectivist setting; the latter 
using an information preservation principle in the frequentist setting. 

This paper provides an overview of these semantics. Detailed theorems and 
proofs can be found in the long version of this paper, which pursues an investiga- 
tion started in [26] . Up-to-date presentations of the TBM and possibility theory 
can be found in [25, 11], respectively. 

2 Subjectivist semantics 

2.1 The transferable belief model and bets 

For long, it had been realized that possibility functions are mathematically iden- 
tical to consonant plausibility functions [21], so using the semantics of the TBM 
for producing a semantics for quantitative epistemic possibility theory is an ob- 
vious approach, even if not explored in depth so far. This link had already been 
realized long ago. What was missing was to show that the analogy goes further. 

Suppose You (the agent who holds the beliefs) consider what beliefs You 
should adopt on what is the actual value of a variable ranging on the frame of 
discernment . You have decided that Your beliefs should be those produced by 
a fully reliable source, and the beliefs are represented by a belief function and 
its associated basic belief assignment (bba) . The basic belief mass assigned to 
each set is the weight given to the fact that all You may know from the source 
is that the value of the variable of interest lies somewhere in that set. A belief 
function (resp: plausibility function) is a set-function that assigns to each event 
(subset of the ‘frame of discernment’) the sum of the masses given to its subsets 
(resp: to the subsets that intersect it). It evaluates to what extent the event is 
implied by (resp. consistent with) the available evidence. When the sets with 
positive mass are nested, the plausibility function is called a possibility measure, 
and can be characterized, just like probability, by an assignment of weights to 
singletons, called a possibility distribution. 

Should You know the beliefs of the source, they would be Yours. Unfortu- 
nately, it occurs that You only know the value of the ‘pignistic’ probabilities the 
source would use to bet on the actual value of [23, 28] . The pignistic probabil- 
ity induced by a belief function is built by defining a uniform probability on each 
set of positive mass, and performing the convex mixture of these probabilities 
according to the mass function. The knowledge of the values of the probabilities 
allocated to the elements of is not sufficient to construct a unique underlying 
belief function. Many belief functions can induce the same probabilities. So all 
You know about the belief function that represents the source’s beliefs is that it 
belongs to the set of beliefs that induce the supplied pignistic probabilities. 

Since several belief functions, lead to the same betting rates. You have to 
select one that most plausibility reflects the actual states of belief of the source. 
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A cautious approach is to obey a ‘least commitment principle’ that states that 
You should never presuppose more beliefs than justified. Then, You can select 
the ‘least committed’ element in the family of belief functions compatible with 
the pignistic probability function prescribed by the obtained betting rates. The 
first result of this paper is that the least committed belief function is consonant, 
that is, the corresponding plausibility function is a possibility function. This 
possibility function is the unique one in the set of belief functions having a pre- 
scribed pignistic probability, because the pignistic transformation is a bijection 
between possibilities and probabilities. So this possibility function turns out to 
be the least committed belief function whose pignistic transformation is equal 
to the pignistic probabilities supplied by the source. 

More formally let m(A) be the basic belief mass allocated to subset A. The 
function m is called a basic belief assignment (bba). The sum of these masses 
across all events is 1. The degrees of belief bel{A) and plausiblity pi (A) are 
defined for all A C , by: 

bel{A) = ^ m{B) pl{A) = ^ m{B) = bel{ ) — bel{A). 

ID^BCA BnA^li 

In order to enhance the fact that we work with non-normalized belief func- 
tions (m(0) can be positive), we use the notation bel and pi, whereas Shafer 
uses the notation Bel and PI. Another useful function that is also in one to one 
correspondence with any of m, bel and pi is the commonality function q such 
that: q{A) = 'Eb:Acb'>t^(B). 

2.2 Consonant belief functions 

A belief function is said to be consonant iff its focal elements are nested ([21], 
pg 219). By extension, we will speak of consonant basic belief assignments, com- 
monality functions, plausibility functions, . . . 

Theorem 1. Consonant belief functions. ([21], Theorem 10.1, pg 220) Let 
m be a bba on . Then the following assertions are all equivalent: 

1. m is consonant. 

2. bel{Ar\ B) = mm{bel{A),bel{B)), yA,BC 

3. pl{A U B) = ma,x{pl{A) , pl{B)) , WA,BC 

4 . pl{A) = max^gAP^(w), for all non empty A C 

5. q{A) = imn,^ pi (uj) = min^^g^ ^(w), for all non empty A C 

Items 2 and 3 shows that consonant belief and plausibility functions are 
necessity and possibility functions, usually denoted by and N respectively, 
while the pl(w)’s define a possibility distribution, that contains all the necessary 
information for building the other set-functions. The fact that we work with 
unnormalized bba’s does not affect these properties, being understood that we 
never require that possibility and necessity functions be normalized. The differ- 
ence between ( ) or pl{ ) and 1, that equals m(0), represents the amount of 
conflict between the pieces of evidence that were used to build these functions. 
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2.3 Least commitment 

So far, what ‘least committed’ means has not been explained, and refers to the 
capability of comparing belief functions by their informational contents. Dubois 
and Prade [8] have made three proposals to order belief functions according to 
the ‘specificity’, or precision of the beliefs they represent. Let mi and m2 be two 
bba’s on . The statement that mi contains at least as much information as, is 
at least as precise as m2 is denoted mi Qx 1TI2 corresponding to some x-ordering 
where we vary the subscript x. Then m2 is said to be x-less committed than mi. 
The proposed information orderings are: 

— pl-ordering. If ph{A) < ph^A) for all A C , we write mi Qpi m2 

— q-ordering. If qi{A) < q2{A) for all A C , we write mi Cg m2 
~ s-ordering. If mi is a specialization of m2, we write mi Qs tu2 

where pi denotes the plausibility function and q denotes the commonality func- 
tion. 

The idea behind the pl-ordering is that a belief function is all the more specific 
as the intervals ranging from the belief degree to the plausibility degree of each 
event are small. 

The idea behind the q-ordering is maybe less obvious. The commonality 
function of an event reflects the amount of support this event may received from 
its supersets. So, q{A) represents the portion of belief that may eventually be 
assigned to A. The more amount of belief remains unassigned, i.e. the bigger 
the focal elements having a high mass assignment, the higher the commonality 
degrees and the less informative is the belief function. In particular, if m( ) = 1, 
then q{A) = 1 for all sets. More generally, to consider m( ) as a rough measure 
of uninformativeness of a belief function seems reasonable. Suppose now we know 
that the actual world belongs to A C . Then m\A\{A) obtained by conditioning 
m with Dempster’s rule of conditioning becomes the ‘conditional measure of 
uninformativeness’ in context A. It just happens that m\A\{A) = q{A), so the 
commonality function is the set of conditional measure of uninformativeness, 
and the fact that a measure of information content turns out to be a function of 
the g’s becomes very natural. 

The concept of specialization (s-ordering) [7, 31] is at the core of the transfer- 
able belief model [19]. The intuitive idea is that the smaller the focal elements, 
the more focused are the beliefs. Let Wy [BTf] be the basic belief assignment that 
represents Your belief on given the background knowledge (BK) accumulated 
by You. The impact of a new piece of evidence Ev induces a change in Your 
beliefs characterized by a redistribution of the basic belief masses of mylBK] 
such that mY[BK]{A) is reallocated to the subsets of A. In a colloquial way, 
we say that ‘the masses flow down’. The new belief function is said to be a 
specialization of the former one. More generally, m2 is a specialization of mi if 
every mass mi{A) is reallocated to subsets of ^ in m2. See [7] for the technical 
definition. 

Dubois and Prade [7] prove that : 
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— mi Es 1712 implies mi Qpi m 2 and mi m 2 , but the converse is not true, 
and 

— mi Cp/ m 2 and mi m 2 do not imply each other. 



2.4 Pignistic probabilities 

Suppose a bba m that quantifies Your beliefs on . When a decision must be 
made that depends on the actual value wq where loq G , You must construct 
a probability function in order to make the optimal decision, i.e., the one that 
maximizes the expected utility. This is achieved by the pignistic transformation. 
Its nature and its justification are defined in [23, 28, 25]. 

Smets [23] has shown that the only transformation from m to BetP that 
satisfies some rationality requirements is the so-called pignistic transformation 
that satisfies: 



BetP if) 



E 

A-.uieAC 



m (A) 

\A\{l-m (0))’ 



Vw G 



( 1 ) 



where |^| is the number of elements of in A. 

It is easy to show that the function BetP is indeed a probability function 
and the pignistic transformation of a probability function is the probability func- 
tion itself. We call it pignistic in order to avoid the confusion that would consist 
in interpreting BetP as a measure representing Your beliefs on 

The result showing that the least committed set of beliefs yielding a pre- 
scribed pignistic probability can be represented by a possibility function, has 
been formally obtained in two ways, depending on how belief functions are com- 
pared in terms of information contents. Comparing the belief functions having a 
prescribed pignistic probability, it can be proved that the least informed one in 
the sense of the g-ordering is a possibility function. The belief functions having 
a prescribed pignistic probability are called isopignistic. The following theorem 
has been obtained: 



Theorem 2. Let BetP he a pignistic probability function defined on with 
the elements tOi of so labeled that : 

BetP (wi) > BetP (^ 2 ) > • ■ • > BetP (w„) 

where n = \ |. Let ^isoP{BetP ) be the set of isopignistic belief functions 

induced by BetP . The q-least committed element in ^isoP{BetP ) is the 
consonant belief function of mass fh whose only focal elements are the subsets 
Ai = {(jji,uj 2 , • ■ • , uji} and: 

fh{Ai) = \Ai\ ■ {BetP (uJi) - BetP (wi+i)) 



where BetP {tOn+i) is 0 by definition. 
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The probability-possibility transformation described by the theorem was in- 
dependently proposed by Dubois and Prade [ 13 , 4 ] a long time ago, using a 
very different rationale. The other informational orderings do not lead to unique 
least informed solutions. However a scalar index for comparing belief functions 
in terms of specificity has been proposed in [ 5 ]. The idea is based on the fact 
that the level of imprecision of a set used to represent a piece of incomplete 
knowledge is its cardinality (or the logarithm thereof). A belief function is for- 
mally a random set, and the degree of imprecision of belief function is simply its 
expected cardinality (or expected logarithm of the cardinality). Let 

/(to) = ^ |A| • m{A) 

Comparing isopignistic belief functions in terms of expected cardinality, the 
same result as above obtains: 

Theorem 3. The belief function of maximal expected cardinality I {Bel) among 
isopignistic belief functions induced by BetP is the unique possibility function 
having this pignistic probability. 

3 The minimum rule 

The story goes on. Suppose we collect the pignistic probabilities about the actual 
value of from two sources. From these two pignistic probabilities,You build 
two consonant plausibility functions, i.e., the two possibility functions induced 
by the observed betting rates as presented above. How to conjunctively combine 
the data collected from the two sources? Do we have to redo the whole betting 
procedure or can we get the result directly by combining the two possibility 
functions? We will show in this section that indeed the last idea is correct. 

In possibility theory, there exists such a combination rule that performs the 
conjunction of two possibility functions. Let i and 2 be two possibility dis- 
tributions on that we want to combine conjunctively into a new possibility 
function 12. The most classical conjunctive combination rule to build 12 con- 
sists in using the minimum rule: 12 (w) = min( i(a>), 2(‘-^)) for all to G and 

its related possibility measure is given by 12(A) = max^^gAc 12 (w). Could it 
be applied in the present context? We will show here that it is indeed the case. 

We must first avoid a classical trap. In belief function theory, the conjunctive 
rule of combination for the bba’ produced by two distinct pieces of evidence 
is Dempster’s rule of combination. It is well known that Dempster’s rule of 
combination applied to two consonant plausibility functions does not produce 
a consonant plausibility function. So Dempster’s rule of combination does not 
seem adequate to combine possibility measures. It seems thus that the analogy 
between consonant plausibility functions and possibility functions collapses here. 
This is not the case. Dempster’s rule of combination requires that the involved 
pieces of evidence are ‘distinct’, a concept analogous to independence in random 
set theory. All we have here are the betting behaviors of the two sources, and 
‘distinctness’ of the sources cannot be assumed. 
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In fact, other combination rules exist in the TBM, based on some kind of 
cautious approach and where ‘possible correlations’ between the involved belief 
functions are considered. How to combine two bba’s conjunctively, when you 
cannot assume they are produced by two ‘distinct’ pieces of information? You 
may assume that the result of the combination must be a specialization of each 
of them (since the result of the combination should be a belief function at least 
as informative as the ones You start with). As said above, a specialization of a 
bba mi is a transformation of mi into a new bba m2, both on the same frame 
of discernment, such that every mass mi (A) given by the first bba to a subset A 
of its frame is split and reallocated to the subsets of A so as to form the second 
bba. 

So consider all belief functions that are specialization of both initial possibility 
functions derived from the pignistic probabilities produced by the two sources. 
In that family, apply again the ‘cautious’ approach and select as Your belief 
the least committed element of that family in the sense of specialization, which 
is the stronger notion of information comparison. The main result is that this 
procedure again yields a consonant plausibility function and it turns out to be 
exactly the result obtained within possibility theory when using the minimum 
rule. 



Theorem 4 . Let mi and m2 he two consonant belief functions on with qi 
and q2 their corresponding commonality functions. Let SVi and SV2 he the set 
of specializations of mi and m2, respectively. Let 912(A) = min(9i(A), 92(A)) 
for all A C , and mi2 its corresponding hha. Then mi2 = mio,2 = min{m : 
m G SV{mi) C\ SV{m2)} in the sense of s-ordering, and this minimally specific 
element is unique. 

We call the last combination the hyper-cautious conjunctive combination 
rule. 

So the direct combination approach developed in possibility theory and the 
one derived using the TBM detour are the same (see Figure 1 ). This result 
restores the coherence between the two models, and thus using the TBM op- 
erational definition to explain the meaning of the possibility values is perfectly 
valid and appropriate. 
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Fig. 1 . Epistemic possibilities. Isopign = finding the set of belief functions that share 
the same pignistic probabilities. LC = least committed. 
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4 Objectivist semantics 

Since possibility measures are special cases of plausibility functions, they are 
also, at the mathematical level envelopes of special families of probability func- 
tions (see, e.g. [10,2]. Let be a possibility measure and *P( ) be the set of 
probability functions dominated by 

Suppose a probability function P is obtained via some statistical experiment. 
Suppose that for some reason one wishes to use a possibilistic representation of 
this information, maybe because we just need an approximation of it, or because 
we want to compute a linear convex combination of them without knowing the 
weights (see section 5). The possibility measures that are candidates for rep- 
resenting P must clearly be such that P G *P( ). We shall say that covers 

P. Again there are many possibility measures obeying this constraint. It again 
makes sense to use an informational principle to pick the best induced by 
P. However the situation is different from the subjectivist setting. In the latter 
the pignistic probability is just what is revealed about the epistemic state of the 
agent by the betting experiments. So a principle of cautiousness prevails in order 
to be faithful to the incompleteness of the information. In the objectivist set- 
ting, P represents the information. Moving from a probabilistic to a possibilistic 
setting means losing information since we only get (special) probability bounds. 

So the natural informational principle for picking a reasonable possibility dis- 
tribution representing P is to preserve as much information as possible, hence 
picking the most informative possibility distribution (in the sense of any x- 
ordering above) in II{P) = { : P G fp( )} by taking the possibility function 

that is pointwise minimal. It has been proved that generally this maximally in- 
formed possibility distribution exists and is unique. When P defines a total order 
of a finite referential. It is also true for ” bell-shaped” unimodal distributions on 
the real line. When there are elements of equal probability, unicity is recovered if, 
due to symmetry, we also enforce equal possibility of these elements. See details 
in [12,20]. 

5 The maximum rule 

Again the story can be pursued considering the fusion of two probability dis- 
tributions Pi and P 2 coming from two statistical experiments pertaining to the 
same phenomena. If the fusion takes place on the data, it is usually enough to 
add the two sets of data, and derive the corresponding probability. It comes 
down to a linear convex combination of Pi and P 2 whose weights reflect the 
relative amount of data of each source. 

However if the original data sets are lost and only Pi and P 2 are available, 
the relative weights of the data sets are unknown. The probability distribution 
resulting from merging the two data sets is of the form Pi -|- (1 — )P 2 where 
is unknown. It gives a family of probability distributions P and the question is 
to And the most informative possibility distribution such that F is included 
in *P( ) using the above principle of information preservation. Let 1 and 2 
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be the most informative possibility measures covering P\ and P2, respectively. 
Then 1 > Pi and 2 > P2, eventwise. Now it is obvious that 

max( 1, 2) > max(Pi, P2) > Pi + (1 - )P2 

It turns out that the set function = max( 1, 2) is also a possibility mea- 
sure with possibility distribution max( 1, 2)- So = max( 1, 2) encodes 

a family of probability measures that contains Pi -I- (1 — )P2 for any in 

the unit interval. However there are events A, B such that P\{A) = i(^), 

and P2(P) = 2(P)> basically the complements of the confidence sets. So 

= max( 1, 2) is actually the valid upperbound, i.e. it covers all the convex 

mixtures of Pi and P2. Now let be the most informative possibility measure 
covering Pi -|- (1 — )P2. Obviously, the intersection over of all sets of possi- 
bility measures less specific than has sup as a lower bound and it is the 
most specific possibility measure covering all the convex mixtures of Pi and P2. 
However it is clearly less than or equally specific as Hence it is equal to it. 
It is thus proved that the most informative possibility distribution covering all 
the convex mixtures of Pi and P2 can be obtained as the idempotent disjunctive 
combination of the possibility measures 1 and 2 obtained from Pi and P2. 
Hence this setting justifies the maximum combination rule of possibility theory 
(see Figure 2 ). 




Fig. 2. Objective possibilities. Dom = dominating possibility measures. MI = Maxinf 
= maximally informative possibility measure. Pla = intersection over all in [0,1]. Ua 
= union over all in [0,1]. Other symbols as in text. 



6 Conclusion 

This paper studies two operational settings for the measurement of degrees of 
possibility. In the first one. Quantitative Epistemic Possibility theory can be 
viewed as a very cautious application of the TBM. It uses the operational def- 
inition of the TBM as an operational definition of the values of the possibility 
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function whereby the betting rates provided by an agent only partially reflect be- 
liefs. In a frequentist setting, a possibility measure can be induced from frequency 
observations as a consonant family of certain confidence sets. These operational 
settings shed light on well-known idempotent combination rules of possibility 
theory. The minimum and maximum rules are justified, one in each setting, 
based on opposite information principles. We provide a semantics for fuzzy set 
theory through quantitative possibility theory, based either on standard behav- 
ioral methods of subjective probability or as an extension of standard statistical 
practice. In both cases a probability measure is replaced by a possibility measure. 
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Abstract. Possibility theory offers a qualitative framework for representing un- 
certain knowledge or prioritized desires. A remarkable feature of this framework 
is the existence of three distinct compact representation formats which are all se- 
mantically equivalent to a ranking of possible worlds encoded by a possibility 
distribution. These formats are respectively: i) a set of weighted propositional 
formulas; ii) a set of strict comparative possibility statements of the form ”p is 
more possible than g”, and Hi) a directed acyclic graph where links are weighted 
by possibility degrees (either qualitative or quantitative). This paper exhibits the 
direct translation between these formats without resorting to a semantical (expo- 
nential) computation at the possibility distribution level. These translations are 
useful for fusing heterogenous information, and are necessary for taking advan- 
tages of the merits of each format at the representational or at the inferential level. 



1 Introduction 

Usually, knowledge can be equivalently expressed in different formats. However, most 
of the approaches to reasoning under uncertain or incomplete information privilege a 
particular compact representation framework which is used both as a basis for commu- 
nicating information and for performing inferences. However the interest of working 
with different representation modes has been pointed out recently in several works [4, 
9]. Clearly, the use of several representation formats raise the issue of their representa- 
tional equivalence, of their translation into another format, and of their respective merits 
( e.g., for elicitating knowledge, or for computational purposes). 

In that respect, the possibility theory framework [10] offers different formats for 
representing knowledge either in terms of a possibilistic logic base [5] where classi- 
cal formulas are associated with certainty weights, or in a graphical manner, using a 
possibilistic directed acyclic graphs (DAG) [7, 1] for exhibiting some independence 
structure, or also in comparative terms expressing (under the form of constraints) that 
some situations are more possible than others [2]. Each of these representations have 
been shown to be equivalent to a possibility distribution which rank-orders the possi- 
ble worlds according to their level of plausibility. These formats can be used not only 
for representing knowledge, but also for modelling desires, then the possibilistic logic 
weights express priorities, and the possibility distribution encodes the levels of satisfac- 
tion of reaching each world. The framework can be made fully qualitative by referring 
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only to a stratification of the formulas (where the distribution is replaced by a well or- 
dered partition), or may use a symbolic discrete linearly ordered scales, or can as well 
be interfaced with numerical settings by using the unit interval as a scale. According to 
the chosen scale, a different type of conditioning should be used [6]. 

This paper offers an overview of the translations between the three representations 
by summarizing existing results and providing the remaining bridges between represen- 
tations formats. The same example is used along the paper for illustrating the different 
transformations. The advantages of each format are briefly pointed out. 



2 Background on possibility theory 



A possibility distribution tt is a function mapping a set of interpretations Q into a lin- 
early ordered scale, usually the interval [0, 1]. 7t(w) represents the degree of compat- 
ibility of the interpretation ui with the available beliefs about the real world in a case 
of uncertain information, or the satisfaction degree of reaching a state ui for modelling 
classical preferences. 7 t(w) = 1 means that it is totally possible for ui to be the real 
world (or that ui is fully satisfactory), 1 > 7 t ( w ) > 0 means that ui is only somewhat 
possible (or satisfactory), while 7 t ( w ) = 0 means that ui is certainly not the real world 
(or not satisfactory at all). A possibility distribution is said to be normalized if 3ui s.t. 

= 1. Only normalized distributions are considered here. 

Given a possibility distribution tt, two measures are defined which rank order the 
formulas of the language. The /7055ff>f/fry measure of a formula ((): = max{7T(ui) : 

ui \= </)}, which evaluates the extent to which (f> is consistent with the available beliefs 
expressed by tt, and the necessity (or certainty) measure of a formula </): N{<l)) = 1 — 
n(-^(j)), which evaluates the extent to which (f> is entailed by the available beliefs. 

In a qualitative setting, the possibility distribution tt can be represented by its well 
ordered partition LFOT’(7 t) = EiLI- ■ -LlEn such that: EiLI- ■ -LlEn = O, EiCiEj = 0, 
and , tt(ui) > tt(ui') iff ui E Ei, ui' E Ej and i < j. 

Each possibility distribution has a unique well ordered partition, while the converse is 
false. However all numerical counterparts tt ofWOP(TT) = ifiU - • Uifn are obtained 
by associating weights a,- to Ei such that 1 = ai > «2 > • • • > «n > 0. 

Conditioning in possibility theory depends if we use an in ordinal or a numerical 
scale. In an ordinal setting, min-based conditioning is used and is defined as follows: 
n{q |m p) = 1 if n{p Aq) = II (p), and II {q |„ p) = II {p A q) if II {p A q) < II{p). 
In a numerical selling, Ihe product-based conditioning is used: II [q 
Moreover, if II (p) = 0, Ihen II [q |x p) = II [q \m p) = 1. 

Bolh conditioning satisfy an equation of Ihe form: II{q) = □(7r(g | p), II (p)), which 
is similar lo Bayesian conditioning, for □ = min or product. 



X _ n{pAq) 

P) - n(p) ■ 



3 Compact representations 

This section presenls Ihree compacl represenlalions of possibilily dislribulions: a possi- 
bilistic knowledge base denoted by E, a strict comparative possibility base denoted by 
V and a possibilistic graph denoted by iTG. 
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In the following we recall each of these compact representations and show their corre- 
spondencies with possibility distributions (or well ordered partitions). We will use the 
following example for illustrating the different transformations. 

Example 1 . Let r, s, w be three symbols which stand for ” it rains”, ’’sprinkler is on” and ” 
the grass is wet” respectively. Let 12 = {ciJo = ~'i — <s^w, loi = -ii — >sw, L02 = ~^rs^w, 
L03 = -<rsw, L04 = 1 — is-iro, L05 = 1 — <sw, LOe = rs~<w, tor = rsw}. 

Let 7T be a possibility distribution defined as follows: 

= 1, = Tr{i04) = = I and Tr{uJe) = = |. 

In this example, we have only considered three levels of possibility degrees for simplic- 
ity. The most normal worlds are ”it does not rain, and either sprinkler is on and the grass 
is wet or sprinkler is off and grass is dry”. The most surprising worlds (having weight 
i) are encountered when it rains and sprinkler is on. However, one could refine this dis- 
tribution more to better match the reality. For instance one could split the worst worlds 
by considering that ”it rains and sprinkler is on and grass is wet” is less surprising than 
”it rains and sprinkler is on and the grass is dry”. However, for keeping the example 
simple enough, we only consider three levels, here arbitrarily encoded as i, | and 1. 



3.1 Possibilistic knowledge bases 



A possibilistic knowledge base is a set of weighted formulas of the form 2 : = {{<l>i,ai) : 
i = l,n} where <j)i is a classical formula and, a, belongs to [0, 1] in a numerical setting 
and represents the level of certainty or priority attached to </){. 

In a qualitative setting, a possibilistic base S can be represented by a well ordered par- 
tition IT OH (27) = Hi U • • • U Hn where Hi contains the most certain classical formulas 
in 27, Sn contains the least ones, and more generally formulas in Si are strictly more 
certain than formulas in Sj when i < j. For each well ordered partition Hi U • • • U Hn 
we can construct a possibilistic base 27 by associating to each formula in Si a weight 
Ui, such that 1 > ai > • • • > > 0. 

Given a possibilistic base 27, we can generate a unique possibility distribution ns, 
where interpretations will be ranked w.r.t. the highest formula that they falsify, [5]: 



Vw e o, 7Tx’(w) 



1 ifV((?ii, a,-) e 27, w ^ 

1— max{ai : ((j)i,ai) £27 and w ^<l)i} otherwise. 



( 1 ) 



The converse transformation from tt to 27 is straightforward. Let l>ai>->a„>0 
be the different weights used in tt. Let <j)i be a classical formula whose models are those 
having the weight a,- in tt. Let 27 = {{^<l)i, 1 — Oj) : i = 1, n}. Then, ttjj = tt. 



3.2 Strict comparative possibility bases 

A strict comparative possibility base H is a set of constraints of the form ”in context 
p, q is more possible than -ig” i.e., II(p A q) > II(p A -ig), denoted by p — ;> g. 
This can either express a general rule having exceptions, or the conditional desire of 
an agent. It encompasses the general case of constraints of the form II (r) > n(s), 
which is equivalent to the default rule r V s — ?> “is [2] since iT((r V s) A -is) > 
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n {{r\/ s) As) — iT(rA-is) > iT(s) — iT(r) > iT(s) (— ma*(7r(rAs), Tr(rA-'s)) > 
max{Tr{r A s), 7r(-ir A s))). 

Each strict comparative possibility base V induces a unique qualitative possibility 
distribution WOP('K'p) obtained by considering the least specific solution satisfying: 

n (pi A qi) > n (pi A -ig,-) (2) 

for all Pi -A qi of V. The constraint (2) means that the most plausible situations where 
Pi A qi is true, are preferred to the most plausible situations where pi A ~^qi is true. 

In [2], an algorithm has been provided to compute WOP('K'p) = ifi U • • • U En- 
Here we only recall its basic ideas, which consist in putting each interpretation in the 
lowest possible rank (or highest possibility degree) without violating constraints (2). 
The only case where we cannot put ui in some partition Ei, is when ui is in the right part 
of some constraint (where there is a rule pi -A qi such that ui \= pi A ~^qi), and none of 
the interpretations of the left part of this constraint (i.e., ui \= Pi A qi) is already classified 
in some Ej with j < i. Therefore W O P (n-p) is computed as follows: for each step i, 
we put in Ei all interpretations which are not in the right part of any constraint, then we 
remove all rules pi -A qi such that there exists at least ui in Ei such that ui \= pi A qi . 
The following example illustrates the steps of the algorithm: 

Example 2 . Let V = {r -is,sVr w,^s -a These mles stand for: ” generally, if 

it rains, then the sprinkler is off’, ’’generally, if either it rains or the sprinkler is on, then the grass 
is wet” and ’’generally, if the sprinkler is off, the grass is dry”. Let V be the set of constraints 
associated with V. 

V = {Cl : n{r A -is) > i 7 (r A s), C2 : f 7 ((s A r) Aw) > n{{s V r) A ~^w), 

C3 : n(-<sA-<w) > n(-<sAw)}.LetC'D = {{L{C,), R{C,)) : i = 1 , 3 } = {({^1)4, ^5}, {^ 6 , ^7}) 
({ci)3, ^5, ^7}, {ci)2, ^4, ^e}), ({ci)o,^4}, {ci)i, ciJs})}, where the pair {L{C,), R{C,)) means 
max{n{Lo) : to \= p, A q,} > max{n{Lo) : to \= p, A 

At the first step, we put in Ei the interpretations which do not belong to any L(Cj ) in C® , we get 
El = {ciJo , ^3}. Then, we remove pairs in Cr> s.t. L(C,) contains at least one interpretation from 
El, we get Ct> = {({^4 , ^5}, {^ 6 , ^1)7})}. In a similar way, we get: E2 = {cji, cj2, (Xi, cjs} and 
E3 = {ci)6 , ^7 } • It can be checked that V induces the same distribution as in Example 1 . 

Let us now provide the converse transformation from tt to V. Again letl = ai>a 2 > 

• • • > > 0 be the different weights used in tt, and <j)i be the classical formula whose 

models have a weight equal to a, . Then, the comparative base associated to tt is: 

P = -A <f>2, - ' ' I 

A ■ ■ ■ A <f>i-l) -A <f>i, ■ ■ ■ , A <f >2 A ■ ■ ■ A <f>n-2) -A 
This strict comparative possibility base means that the most normal situation are <l)i, 
and then (f>2 if <i)i is false, and so on. Let up be the possibility distribution associated 
with-p. Then, WOP{ttp) = WOP{tt). 



3.3 Possibilistic networks 

The last compact representation is graphical and is based on conditioning. Symbolic 
knowledge is represented by DAGs, where nodes represent variables (in this paper, we 
assume that they are binary), and edges express influence links between variables. When 
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there exists a link from A to B, A is said to be a parent of B. The set of parents of a 
given node A is denoted by Par (A) . By the capital letter A we denote a variable which 
represents either the symbol a or its negation. An interpretation in this section will be 
simply denoted by Ai • • • Uncertainty is expressed at each node as follows: 

- For root nodes Ai we provide the prior possibility of a, and of its negation -la, . These 
prior should satisfy the normalization condition: max(II(ai), n(-^ai)) = 1, 

- For other nodes Aj , we provide conditional possibility of aj and of its negation ~^aj 
given any complete instantiation of each variable of parents of Aj, uipar{Aj)- These 
conditional possibilities should also satisfy the normalization condition: 

max{n{aj \ Ulpar{Ai}) , n{^aj \ujpar(A,))) = 1- 
Due to the existence of two definitions of possibilistic conditioning, we get two kinds 
of possibilistic graphs, that we denote respectively by BGm and iTGx , depending if 
we use min-based or product-based conditioning. A min or product-based possibilistic 
graph induces a unique joint distribution using the chain rule □ = min or product: 

Definition 1 Let BG be a direct possibilistic acyclic graph. The joint possibility distri- 
bution associated with II G is computed with the following equation ( called chain rule): 
7t(w) = □{iT(a I uipar{A)) : ui \= a andui ^ uJpar(A)]- 

The converse transformation is straightforward. As said in Section 2.3, the two defi- 
nitions of conditioning satisfy Bayesian rule. Hence, for ui = A 1 A 2 • • • we have: 

7t(Ai • • - An) = □(7t(Ai I A 2 A 3 ■ ■ ■An),7r(A2A3 ■ ■ - An)). 

Applying Bayesian rule repeatedly for any given ordering A-ir ■ ■ ,An yields: 

• • - An) = □(7t(Ai I A2 • • ■An), 7 r(A 2 | A3 • • • A„ ) , • • • , 7t(A„_i I An),n{An)). 



4 Logical bases and comparative bases 

4.1 From a comparative base to a possibilistic base [2] 

Algorithm 1.1 shows how to transform a comparative base V into a possibilistic base 

U. 

Algorithm 1.1: From V to S 

begin 

m 1 ; 
while T" / 0 do 

A = {-ipi V qi : Pi ^ qi E P} ; 

■Pm = {pt -A- q, E V and A U {p,} is consistent}; 

_ T = T — Tm , m m -|- 1 ; 

return U = {(-.p, V g,, i) : p, g, G Vj} 

end 

The stratification in E reflects the specificity relation between elements of V when the 
letter encodes rules having exceptions. For instance, Vi contains only the most general 
rules. Indeed, if p — ;> g is not considered in Vi , then it means that AU {pj is inconsistent, 
hence p would inherit its own property q, but also its negation property -ig from some 
superclass, hence it is not a general rule. This analysis is iterated in the algorithm. Let 
■p be a comparative base, and E be its possibilistic counterpart given by the previous 
algorithm. Then, WOP{ttp) = WOP{tts). 
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Example 3. Let V = {-is ~^w, s V r ^ w, r ^ -is} considered in Example 2. 

We have A = {s V -<w, -<(s V r) V w, -<r V -is}. Applying Algorithm 1, we get: m = 3, 

Ti = {-IS -<w, s V r ^ w} and T 2 = {r ^ Hence, 

E = {(sV-iro, I), (-'sVw, I), (-<r V w, |), (-irV-is, |)}. We can check that FLOP(7ri;) = 
WOP(7ttd). Moreover, E is such that wj; = tt, where tt is given in Example 1. 

4.2 From a logical base to a comparative base 

Let E = 5i U • • • U be a possibilistic knowledge base where formulas of Si are 
prioritized over formulas of fori < j. Let us denote -i5j = \J^ Then, we 

can check that the set of strict comparative possibility V associated with E is given as 

follows: •Px’ = 5„,-i5„_i V -i5„ 5„_i, 

~^S„-2 V -i5„_i S„-2, ■ ■ ■ , -'5'i V ->52 Si}. 

The rule ~^Si \/ ^Si+i — ?> Si means that if Si or is false, then plausibly we prefer 
reject and accept Si. This simply reflects the priorities between Si’s dictated by 
the possibilistic base 27. It can be checked that: WOP {Vs) = WOP{E). 

Example 4. Let 17 = {(sV-iry,|),(-isVry,|),(-irVry,|),(-irV-is,|)}. Then, E = 5i U52 
s.t. 5i = {-If V -is} and ^2 = {s V -iro, -is V w,-<r V w}. 

Then,P = (sV-iry)A(-isVry)A(-irVry), -i(-irV-is)V-i((-isVry)A(-irVry)A(sV-iry)) 

-If V -is} which is equivalent to (sw V -<i — is-iro), rs V s-<w V 1 — 'W V -<sw -<r V -is}. 
The comparative base obtained in this example apparently differs from the one of Example 2, 
even if all of them induce the same joint distribution. However, we could recover the syntactic 
equivalence between these default mles using System P and rational monotony. 

Note that the same possibilistic base can be described by different strict comparative 
possibility bases. This is not surprising, as in classical logic, two different sets of for- 
mulas can have the same set of models. Rules used to show the syntactic equivalence 
between comparative possibility bases are System P and rational monotony [ 8 ]. Indeed, 
possibilistic logic is in full agreement with System P and rational monotony [2]. 

5 Possibilistic bases and possibilistic graphs 

5.1 From graph to possibilistic bases 

The basic idea is that a possibilistic base associated with a graph can be viewed as the 
result of fusing elementary bases. These elementary bases, associated with each variable 
(node) of the graph, are composed of all conditional possibilities, different of 1 attached 
to the node. More precisely, the elementary base associated with the variable A is: 

Ea = {(-'fli V-iPa, , 1-ai) :P(ailPaJ = ae PG and a 7 ^ 1}. 

Namely, each conditional possibility is transformed into a necessity formula which is 
the material counterpart of a conditional (remember that N{p \ q) = 1 — II {^p | 7)). 

The following proposition shows that applying the chain rule on the graph gives the 
same result if we combine the possibility distributions associated to elementary bases: 

Proposition 1 Let II G be a causal graph and ttq the joint distribution obtained from 
n G using the chain rule. Let Ea, be the possibilistic base associated with the node Ai, 
andiTA, be its possibility distribution using Definition 1. Then, ttq = E\i-inttA,- 
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Now to compute the global base associated with UG we use the results of [3] which 
provide syntactic counterparts of combining bases with minimum or product operators: 

Proposition 2 Let Si and S^ be two bases, tti and tt 2 their associated distributions. 

1 . Then, the possibilistic base associated with min['Ki , 1:2) is: Si U S2- 

2 . The possibilistic base associated with tti x 7T2 is: 

SiU S2LI {{(f)i V ipj,ai + (]j - ai(]j) : {(f)i, a,-) £ Si and (ipj,Pj) e S2}. 

Then, the base associated with a graph is obtained by the successive application of the 
previous proposition on the elementary bases Sa, ’s. 

Examples. Consider the following 77G. The set of variables is C = {_R, 5, IT}. 

R 



1- Let us consider i7Gm where min-based conditional possibility degrees are computed from the 
distribution given in Example 1 , n{R) n{S\R) n{W \ SR) 



We have: Sr = {(-r, f)}, Ts = {(-rV-is, |)} and Sw = {(rVsV-iro, |), (rV-isVro, f)}. 
Then, the possibilistic base Sm associated with UGm is: 

Sm = Sr U Ss U Sw = {(“'r, |), (r V s V -<w, |), (r V -<s V w, |), {-<r V ~<s, |)}. 

2- Let us now consider IIGx where product-based conditional possibilities are also computed 
from distribution of Example 1 : n(R) 



We have: Sr = {(-r, |)}, Ts = {(-■rV-'s, |)} andSw = {(rX/sV-iro, |), (rV-^sX/ry, |)}. 
We first compute the combination of Sr and Ss. We get Srs = {(“t, |),((“'rV-is, i),(-irV 
-IS, I)} which is equivalent to {(“t, |), (-t V -is, |)}. Combining Srs and Sw- We get: 

Tx = {(-ir, I), (-ir V -IS, I), {rW sW ~^w, |), (r V -.s V w, |)}. 

We can easily check that the knowledge bases associated with the (slightly) different 
IIGm and IIGx are equivalent, and even identical in this example. This is expected 
since both graphs are built from the same distribution. 

5.2 Transforming a possibilistic base S into a min-based graph IIGm 

The transformation from a possibilistic base S into II Gm has been given in [ 1 ] . We only 
illustrate the idea by an example. First, an ordering of variables Ai, ■ ■ ■ , An should be 
chosen. This ordering means that the parents of A{ should be among Ai^i, • • • , 2l„. 
Then we proceed into successive decompositions of S. At the first step, S is decom- 
posed, in an equivalent way, into: Sa^ GSr, where Sa^ allows to determine the parents 
of Al and the conditional possibilities attached to Ai. With the same procedure Sr is 
decomposed again into Sa 2 U Sri and so on. The example only illustrates the decom- 
position process from S to Sa^ G Sr. 
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Example 6. Let E be the base of Example 4. Let {W, S, R} be the ordering of the variables. 
The decomposition of E into Ew ^ El follows three steps: 

- The first step consists in putting in Ew all clauses of E containing W. We let L'l, = E — Ew- 

We ge-lEw = {(s V (->8 V w, |), (-r V w, |)} and El = {(t V ->s, |)}. 

Then, we remove from Ew all strictly subsumed' formulas in E (since they are redundant and 
can induce fictitious dependence relations). Ew does not contain subsumed formulas, so it re- 
mains unchanged. 

- The second step consists in determining the parents of W. They are the set of variables in 
{5, R} (i.e., the set of parents of W) which are involved in at least one clause of Ew- Then, 
Par(W) = {5, R}. 

- Lastly, the third step consists in computing conditionals II (W \ iopar(w) )- First, we replace 
ICw by its complete extension ^ w.r.t. Par(W). Then, Ew becomes {(r V s V ->w, |), {->r V 
s V ->w, i), (r V -IS V w, i), (-If V -IS V ro, i), (s V -ir V w, i)}. 

Then, if (aiV 2 ;,a) G Ew and E F a;,(aiV 2 ;,a)is removed from Ew and (a;, a) is added to ICi, 
where ai is either w or -iro. Finally, we compute conditional possibilities from Ew as follows: 
n(ai I Pw) = 1 — a if (-lai V ~^Pw, a) G Ew, and i7(ai | Pw) = 1 otherwise. 

For example II (w \ -is-ir) = | since (s V r V ~^w, |) G Ew- 

It can be checked [ 1 ] that the constructed graph is a DAG, and that Tr^; = , where 

is the possibility distribution obtained from the constructed II G using chaining rule. 



5.3 Transforming a possibilistic base S into a product-based graph UGx 

Referring to the decomposition of a joint distribution, we have: 

irsiAi - --An) = 7t(Ai I A 2 ■ --An) * 7 t(A2 ■ ■■A„). 

Therefore, the construction of the product-based graph is done in two different steps: 
one consists in computing conditional possibilities n(Ai \ A 2 ■ ■ - An), and the other 
consists in constructing a knowledge base El s.t. ttjjp = 7t(g 12 • • • A„). Note that in 
the first step, the aim is to identify parents of Ai since n(Ai \ A 2 • • • An) = ^{Ai \ 
Par[Ai)). 

-Step 1 . Computing parents of Ai 

The determination of parents of Ai is done in an incremental way. First, we remove 
all subsumed formulas and tautologies from E. Then, we take Par(Ai) as a set of 
variables from {A 2 , • • • , An} which are involved at least in one clause containing Ai. 
Par(Ai) are obvious parents of Ai . However, and contrary to the construction of iTG™ 
it may exist other ’’hidden” variables, whose observation influences fhe cerfainfy degree 
of Ai . To see if Par(Ai) has fo be extended or nof, we proceed in fhe following way: 

1- Take an insfance of parenf of Ai which is consisfenf wifh E. Add if fo E. 

2- Compute fhe degree of inference a of ai (resp. -lai) from E. 

3- If a > 0, fhen for each clause having a weighl greater fhan a, add variables involved 
in fhis clause fo Par[Ai). 

* {<f>,ct ) is said to be strictly subsumed hy E, if E-^a (j), where A’>a is composed of classical 
formulas of E having a weight strictly greater than a. 

^ For instance, if B and C are parents of Ai, if (a V h, a) G Ea^ , then we replace this clause 
by {(a V 6 V c, a), (a V 6 V -ic, a)} to extend the clause (a V b, a) to all of parents. 
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Now, once parents of Ai are fixed, the determination of II (ai |x (^Par{Ai)) is 
given as follows: Let (xi, ■ ■ ■ , x„) be an instance of Far(Ai), and ai an instance of 
Al. Recall that by definition: II [a\ \ x\X 2 ■ ■ ■ Xn) = , 

and that II [ai \ xiX 2 ■ ■ ■ x„) = 1 if II (xi ■ ■ ■ x„) = 0. 

Syntactically, it can be checked that: II(i^) = 1 — Inc(S U {(</), 1)}), where 
Inc(IJ) = max{ai : is inconsistent} , where is the set of formulas in 

S whose weight is at least equal to a, . (We recall that S is assumed to be consistent). 
Therefore, to compute II [ai \ xi ■ ■ ■ *„): 

1 . Add { (*i , 1) , • • • , [xn , 1)1 to Let S' be the result of this step. 

2. Compute h = 1 — Inc(S') (h represents II (xi ■ ■ ■ Xn)) 

3. Add {(fli, 1)1 to S' . Let S" be the result of this step. 

4. Compute h' = 1 — Inc(S") (h' represents II{aiXi ■ ■ ■ *„)). Then, 

n(ai I *1 •••*„) = 1 if /i = 0, and II [a\ \ x\ ■ ■ ■ *„) = ^ otherwise. 

Example 7. Let us illustrate the computation of II (^w \ s-r). We first assign the instance 
{(s, 1), (-T, 1)} tor. We get r' = {(s, 1), (-.r, 1), (sV-^w, |), (-^sVw, |), (-.rVw, |), (-.rV 
-IS, I)}. We have Inc(S') = 0. Then, h = 1. 

We now add 1) to S' , we get S" = 1), (s, 1), (-r, 1), (s V ~^w, (s V ~^w, |), 

(-IS V w, I), (-If V w, I), (-If V -IS, I)}. We have Inc(S") = |. Then, h' = |. Hence, 
n{-<w I s-ir) = S = |. With a similar way, we get the following conditional possibilities: 

n{W I SR) 

-Step 2: Computing the marginal base Sp 

Let us first define the possibility distribution TVai as follows: 

TTa^ (w) = 7 t(w) iful \= ai and TVai = 0 otherwise. 

TT^ai is similarly defined. Then, it can be checked that the possibilistic bases asso- 
ciated with TTai and are Sa^^ = 27 U {(ai, 1)} and S^ai = 27 U {(“’Ui, 1)}. 
27„j (resp. S^ai) can be simplified by removing all clauses of the form (ai V x, a) 
(resp. (-ifli V X, a)) since they are subsumed by (ai, 1) (resp. (-ifli, 1)). Also, clauses 
of the form (-lUi V x, a) (resp. (ai V x, a)) are reduced into (x, a) since (ai, 1) and 
(-ifliV*, a) (resp. (-ifli, 1) and [ai\/x, a))implies (x, a) which subsumes (-lUiV*, a) 
(resp. (fli V X, a)). 

Our aim is to compute the base Sp associated with 7 t(A 2 • • • A„) since Sp will be used 
in place of 27 for computing the parents of A 2 . Then, we can check that the possibilistic 
bases associated with the distributions and resulting from the marginalization 
of TTa, and TT^ai ou {A 2 ■■■An] are just 27„j - {(ai, 1)} and S^a^ ~ {(“'ai, 1)} re- 
spectively. Then, Indeed, we are now able to provide the possibilistic base associated 
with 7 t(A 2 • • • A„) by noticing that: 

7t(A2 • • ■An) = max(n^^(A2 ■ ■ • A„), 7T^}^(A2 • • • An)). 

Thus, Sp is the syntactic counterpart [3] of the max operation applied on and 

Sp = {((j)i V iij,min(ai,j3j)){(l),,a,) G Sa^,{^j,l3j) e S 2 }. 

Example 7. (continued) Let us consider again the base given in Example 4. We start with the 
variable W . We first have S^ = {(s V -iro, |), (-is V ro, |), (-ir \/ w, |), (-ir V -is, |), (w, 1)}. 
We remove clauses containing w, we get 27„, = {(s V -<w, |), (-ir V -is, |), (w, 1)}. Now, 
we replace the clause (s V -iro, |) by (s, |). Hence, S^ = {(s, |), (-ir V -is, |), (w, 1)}. 
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In a similar way, we get: L'-.u, = {(-'s, |), (-r, |), (-r V ~^s, |), {^w, 1)}. Then, El = 
{(-T, I), (-If V -IS, I)}. Then, reapplying Step 1 leads to 

n{R) n{S\R) 



It is easy to check that we recover the same tables given in Example 5, point 2. 

6 Concluding discussion 

This paper has shown how to translate any of the three basic compact representations 
formats (logical, graphical, or comparative) into another, in the setting of qualitative or 
numerical possibility theory. Each translation guarantees that the underlying possibility 
distribution remains the same. These correspondences are useful if we have to combine 
pieces of information expressed in different formats, and to check their consistency. 
Each compact format has its interest for communication purposes, either for modelling 
expert knowledge, or for supplying information to the user. Besides, from an inference 
point of view, the logical and the graphical formats are the most interesting ones. There 
exists computational machineries of reasonable complexity [5] in possibilistic logic and 
local computational methods are under development for the graphical representation 
(which also extends to non-binary variables). However, each compact representation is 
not unique since there exists semantically equivalent logical (resp. graphical, compara- 
tive) bases which differ syntactically, as suggested by the example. Then, a procedure 
putting the resulting bases under some standard form may be needed. Moreover there 
exists another logical format (omitted here for the sake of brevity) which is also of in- 
terest: the logical description of the different sets of interpretations having the same 
possibility level. This can be easily derived from the distribution, and the direct compu- 
tation from the possibilistic logic bases is left for further research. 
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Abstract. The conic fitting from image points is a very old topic in pattern recog- 
nition. We propose here some new ideas of handling the difficult situations where 
the noised data span a small section of the conic. This new fitting process takes 
into account explicitly the maximum and minimum total arc length of the conic 
curves to constraint the conic search space. Our algorithm compared with some 
procedures of reference gives improved results. A confidence envelope is then es- 
timated to direct the search for continuations of the ellipse. Considering the data 
organization resolved, we propose then a complete extraction scheme based on a 
fuzzy set representation of the fitting. 



1 Introduction 

One of basic tasks in pattern recognition and computer vision is a fitting of geometric 
primitives to a set of points. The use of primitive models allows reduction and sim- 
plification of data and, consequently, faster and simpler processing. A very important 
primitive is an ellipse, which, being a perspective projection of a circle, is exploited in 
many applications of computer vision like 3-D vision and object recognition, medical 
imaging, industrial inspections... Thanks to its many geometric properties and its dif- 
ferent ways of representation, the elliptic model is an ideal experimentation field for 
estimation. In principle, two kind of approaches can be found ; The voting/clustering 
and optimization methods. 

•The most popular method belonging to the first group is the Hough transform. 
The HT is a robust method of parameter estimation, which doesn’t require any spatial 
organization of the data beforehand. This method makes it possible to detect several 
overlapping and occluding ellipses (for a review see [1]). But, this approach has some 
drawbacks: it does not necessarily produce a unique solution as a) it can be difficult 
to detect the peak which can be spread across a high number of bins; b) the search 
space is multidimensional and hence sparse, which can make the search for the peak 
difficult unless a large number of data points are available; and c) choosing the sizes of 
the bins in the accumulator is problematic. More recently, the fuzzy clustering methods 
have been adapted to the problem of ellipse detection (see [2] for a review). Compared 
to the Hough based methods, these approaches require less computations and memory. 

S. Benferhat and P. Besnard (Eds.): ECSQARU 2001, LNAI 2143, pp. 432-443, 2001. 

© Springer-Verlag Berlin Heidelberg 2001 
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•The least square based fitting methods have also received much attention especially the 
choice of the optimization criteria [3], [4], [5], [6]. However, there are very few works 
on global extraction methods. We can mention the works of RL. Rosin and al [7] whose 
multistage algorithm is proposed to segment connected points into a combination of 
representations such as straight lines and conic sections. M.Li also develops a method 
of 2D-shape description in terms of straight and elliptic segments based on the Min- 
imum Description Length criterion [8]. Most of these methods do not deal with the 
problem of fitting representations into disconnected pixels. T.Ellis and al tackle ellipse 
fitting as a three-stage process [9]. At first, contours are decomposed into straight and 
curved parts. Ellipses are initially fitted to detected arc segments. These initial fits are 
improved by extending the arcs, using existing edge connectivity information. Never- 
theless, the performance of these methods is closely linked to the fact that they require 
data from a large proportion of the ellipse. The results of the fitting problem for short 
curve sections are generally unstable or inaccurate. It is necessary to add some infor- 
mation as T. Ellis does by exploiting connectivity of edges in the scene and initiating 
fitting on connected edges. Similarly, J. Porrill has developed a linear bias correction 
for the extended Kalman filter that allows him to predict a confidence region to direct 
the search for the ellipse continuations [10]. 

In this paper, we propose some new ideas to the ellipse fitting/detection problem. First, 
the fitting problem is resolved from the polar representation of the ellipse [1 1]. To the 
opposite of [11], we propose to estimate in a separated way the parameters and the 
parametrization. We show that the optimal parametrization is solution of a four de- 
gree equation with one unknow. If the parametrization seems to be a drawback to our 
method, its contribution is predominant to improve the parameter estimation in the case 
of short sections and in a noised context. Indeed, instability of the fit for short noisy 
conic segments is a serious practical problem. Integrating explicitly by the way of a 
scale factor, the maximum and minimum total arc length in the cost function, we con- 
straint the conic search space. If we suppose that the image primitives have known 
bounded dimensions, we determine then an analyzing envelope taking into account un- 
certainty of the solution. In a second step, we propose a complete detection scheme 
based on a fuzzy decision stage. A last, our algorithm is applied to the detection of 
mushrooms in development on wheat leaves. 



2 Ellipse fitting in parametric form 

2.1 Principle 

Let data points X, = jy] given in the plane. These points describe an 

ellipse if they verify the parametric system : 



Xi=Xo + A.P (Oi) =Xo+R (a) .F.P {Oi) 



( 1 ) 



where Xq = (xo,yo) are the coordinates of the center, A = 



a b 
c d 



is the matrix of 



the dimensional parameters and P{6i) = {cos 6 i, sin 6 i)^ the parameterization of the 
ellipse. A can be also expressed in fonction of the canonical parameters of the ellipse 
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by the matrix F = diag{Xi, A2) where Ai, A2 are the lengths of the semi-main axes 
(Ai > A2) and the rotation matrix R (a) that represents its angular position a in the 
plane. To fit an ellipse, we need to estimate the parameters from the data points. This 
standard problem is classicaly resolved in the least square sense by minimizing the sum 
of the squares of the distances between the data points and the ellipse : 

N N 

0iXi-, 6i,A, Xo) = Y, ||G,||' = ^ ||X, -Xo-R (a) .F.P (0,)||' (2) 

This is equivalent to solving the nonlinear least squares problem : 

Gi=Xi-Xo-R (a) .F.P (Oi) « 0, Vi G [1, ...,N] (3) 

A classical way to resolve this nonlinear problem is to use the well known iterative 
Gauss-Newton method. Given that the Jacobian matrix is sparse, we can modify, as 
Gander and al do [11], its structure by using Givens transformations and compute the 
QR decomposition only on a block. Then the correction vector is given by backsub- 
stitution. The initial values of the parameters and the parametrization are obtained by 
fitting a cercle to the data points. We propose in this section to estimate in a separated 
way the parameters and the parametrization. In (2), the product R (q!)F is replaced by 
the matrix A and the minimization of (2) is decomposed like that : 

min 6{Xi; 0i,A,Xo) = min{min0(Aj, 6**; A, Aq)} = min0(Aj, A, Aq; 6»j) 

A.,XQ,0i Oi A,Xq Oi 

(4) 

By considering the parametrization known and fixed, the problem is now linear and 
the minimization of (2) is direct: 



N 

0{Xi,ei-, A,Xf,) = Y(^i-h{ei)q,f + {yi-h{6i)qyf (5) 

where h (6i) = [1, cos0,, sin^,] , q^ = [xo,a, b]^ and qy = [yo, c, d\^ . Or in its matrix 
form : 

0 = (A - Hq^f (A - Hq,) + {Y - Hqyf {Y - Hqy) (6) 

The estimation of the coefficients of A and Aq is then resolved by computing two 
speudo-inverse, one on the x-component and the other on the y-component: 

q, = {H^H) HX, qy = {H^ H) HY (7) 

A and Aq being fixed, we minimize (5) in relation to 0,. The derivative of (5) leads us 
to look for the solutions of a quadratic equation with two unknowns: 

{ab + cd) Cf + {b^ + d^ - - c^) CiS— 

{ab + cd) Sf — (bu + dv) Ci + {au + cv) 5, = 0 (8) 
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Where Ci = cos 6i, Si = sinOi, u = x,-Xo y^0,v = yrYo To make the search of the 
solutions simpler, we replace the parameters (a, h, c, d) hy the canonical parameters of 
the ellipse (Ai, A 2 , a) (see eq. (1)). Then (8) is simplified and becomes : 



USi - VCi + WCiSi = 0 



(9) 



with U = Ai (u cos ct-v sin a) ,V = A 2 (u sin ct+v cos a) and W = Then 

either dividing (9) by C, or S, and introducing the variable : 

2 :* = tan {Oimod (tt)) = ^ = — ^ (10 ) 

as unknown, we get a polynomial equation p (z,) of degree four with one unknown : 



4 2[7 3 , 

+ 



[/2 + y 2 _ 






2U 

Y 



■Zi + 



= 0 



( 11 ) 



This equation gives four solutions (zi, Z 2 , Z 3 , Z 4 ) with at least two real zeros. The 

complex solutions are removed whereas we shall concentrate on the real solutions. We 

must also note that the solution according (10) is defined to within tt. Then, the 

solution verifies one of these two conditions :±USi ^ VCi + WCiSi = 0. If (11) 

gives k real solutions 6k = tan“^ (zk) (with k€ [2. .4]), we select the solution 6k* that 

confirms : min ||G, (^i;)||. Then, we obtain an iterative least square fitting procedure : 

k 

■ Step 1 : we compute 6f^\ In this case, a good starting values for the 6^^^ is obtained 
by fitting an ellipse by minimizing the algebraic distance. 

■ Step 2: we compute then X^q and A^^^with (7). 

■ Step 3 : we compute the real solutions of (11) and then 6^'^ . 

■ Step 4: We set t <^t+l and if \0* - 0* ^ \ > V then go to step 1 otherwise stop. 

Figure l.a shows the obtained results witch are compared with the classical iterative 

Gauss-Newton algorithm (see [11]). 30 points (black dots) sample a high curvature 
section of an ellipse and a Gaussian noise with 2 pixels of standard deviation is added 
to each sample points. The ideal ellipse is represented in solid line, represents 

the algebraic fitting initialization, ”-0- ’’represents our fitting (55 iterations) and * 

the Gauss-Newton fitting (32 iterations). Figure l.a shows that the two geometric 
fits in parametric form converge to the ideal solution. Our approach seems to be more 
expansive than the Gauss-Newton approach. If the minimization of the parametrization 
seems to be a drawback to our method, we will see in the next section that is contribution 
is predominant to improve the parameter estimation in the case of short sections. 



2.2 Constrained fitting 

When data points are distributed on a short section of an ellipse, we can notice that most 
classical fitting methods are unstable or diverge. The density of data points being small, 
the problem is considered as badly conditioned. Generally, in that case it is necessary to 
obtain an acceptable solution, by adding some prior information about the data (noise) 
or about the criterion (dimensional constraint). If integrating the noise characteristics 
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into the criterion will offer a distinct improvement on estimation, it is not a determin- 
ing factor when the density of data points being to small. Besides, we can notice that 
statistical methods such as the re-normalization method with bias correction [3] diverge 
if we have not got enough data. Then, the only prior knowledge of the elliptical char- 
acteristics reduces the space of solution and then comes up the real solution. From this 
idea, we propose to modify the previous pattern to bring satisfactory results for short 
sections with distributed data points. We then consider that the elliptic shapes in im- 
ages have known bounded dimensions. It is precisely the matter we encounter in the 
studied microscopic images. The classic use of least square fitting method based on the 
algebraic distance seems not to be appropriated to take this additional constraint into 
account. That is why it is difficult to hnd works on this subject in literature. One of the 
most essential reasons: the implicit parameters of an ellipse are correlated and have a 
large and different field of variations and then unusable to add a dimensional constraint. 
A htting criterion based on the parametric representation of the ellipse seems to be more 
suitable to solve this problem. From this idea, we propose to introduce in the criterion 
a scale factor v allowing to force the solution to stay in a choosen parametric space. To 
approach this optimal solution, the principle consists in weighting the parametrization 
6 by this scale factor. Then, we propose to minimize the following cost function : 

N 

0{Xi-, 9i,p,A,Xo) = ^{Xi -h(vei)qxf + (y* -h(vei)qyf (12) 



Indeed, 9 characterizes the following ratio of arc length : 




27T 

T 




xf + yfdl 



(13) 



where 1 is an arc length along the curve from the starting point, and L is the total 
arc length of the curve. Then, given the arc length 1 of the analysed section known and 
if we alter the total arc length L by v, we introduce implicitly a bias on the results of 
(12)(see fig. l.b). The evolution of this bias is highlighted by observing the evolution of 
(Ai , A 2 ) solutions of (12) as a function of this parameter, we show that : 



(Ai,A2)~[^,-] V^G]0,1] (14) 

ly V A\ 

This property is always true when the ratio p = j- < 0.5. If p > 0.5, the density of 
data points is then sufficient to give a satisfactory estimation of the ellipse parameters. 
By analogy with the property of the similarity of the Fourier transform, we can see 
this term as a A shrinking/dilatation factor of the harmonic (fig. l.b). Then, it becomes 
possible by a good using of this factor in the minimisation process to constraint the 
solution to stay in a predehned aera. 

Formulation of the problem: if we consider the elliptic shapes in images have 
known bounded dimensions lying within the range [La, Lj,] (La is the minimum total 
arc length of the ellipses and Lj is the maximum one) and that exist a reference ellipse 
parametrized by the pair (Aj, A 2 ), then we have to find in the interval [La, Lf,[ the 
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length L* which will give an optimal parametric solution close to fAj, X 2 ) (an example 
is given in figure l.c). 

Indeed, a restrictive solution close to (Aj , A 2 ) avoid, with noisy data, having local 
minimum. The direct minimisation of (12) in function of u seems to be difficult, then 
we propose a dichotomic process to approach this solution : given the interval [La, Lh] 
and I known, 

• Step 1 : initialization of the parametrization from (13). Wfe put L =(La+Lb)/2. 

■ Step 2: we compute two scale factors Ua=(L + La) /2L and ui,=(L + Lh) j2L. 

■ Step 3 : we determine from (5) two elliptical candidates Ea and Eh : 



Ea min0{Xi,Ua8i-, A,Xq) 


(15) 


A,Xo 




Eh -A mm0(Xi,Uh6i; A,Xq) 

A,Xo 


(16) 



To avoid local minimum, we select the ellipse Ea or Eh which has closest dimen- 
sions of (Aj,A 2 ). This can allow us to readapt the length interval by modifying bounds 
as follow: 

If Ea is choosen then : La-t— La, Lb-t— (L+Lb)/2 and L-t— (La+Lb)/2 else : Lb-t— Lb, 
La-t— (L+La)/2 and L-t— (La+Lb)/2). 

• Step 4 : We thus compute in (11) the ellipse parameterization on the chosen ellipse 
and then back in step 1 if \L^ - L^~^ \ >r). 

Results : we consider the discrete contour of the ideal ellipses with 70 percent of 
the boundary points missing. We consider also a very noised context by adding with a 
regular step (every 15 pixels) a zero-mean Gaussian noise with a standard deviation of 
2. To analyse the quality and the stability of the method, this operation was repeated 
50 times. The constraints are initialized : we suppose that the total arc length of the 
studied ellipses lies within the range [L„=200, Lj=400] and the dimensions (Aj=60, 
A2=30) of the reference ellipse are the same that the ideal ellipse. The hgure 2 (top: 
high eccentricity section, bottom: low eccentricity section) present a visual comparison 
of the fitting results between our algorithm (fig. 2. a) and some methods of reference 
such as : Fitzgibbon (b), Taubin (c) and Gander (d). The continuous dark line represents 
the ideal ellipse, the noised dark short section is one of the 50 generated sections and the 
grey lines correspond to the fitting results of the 50 realizations. The table 1 collects the 
average values of the estimated parameters (standard deviation added) and the average 
computation time required. The NFS column gives the number of the Non— Ellipse 
Solutions returned by the previous methods (among the 50 runs). As we can see on 
figure 2 and fable 1 , our approach and the Fitzgibbon’s one are the most stable in a very 
noised context. In addition, the integration of dimensional constraints in the fitting of a 
short section improves obviously the estimation of the ellipse’s parameters (see table 1 
and the figure 2. a ). The compufation time given here is only for illusfralion, since I did 
not try to do any code optimization. All experiments were conducted using the Matlab 
system with a Sun Sparc 20 Workstation. We can however notice that our algorithm 
seems to be the most expensive. The most time-consuming part is the computation of 
the solutions of the polynomial equation (11). 
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(a) 




(b) 




(c) 



Fig. 1. Results of geometric fit in parametric form (a). Evolution of the fitting result in function 
on V (b),(p= 0.4, zero-mean Gaussian noise {a = 2)). Example of evolution of the analyzing 
interval (c). 




Fig. 2. Stability comparison with the main conic-fitting algorithms. Fitting to a noised short sec- 
tion (top: high eccentricity section, bottom: low eccentricity section). 



2.3 Dimensional uncertainty 

Once this average solution, corresponding to the vector parameters {X^,A* ,6* ,L*), is 
estimated, we can then determine a confidence region taking into account uncertainty 
on the result. Indeed, this envelope is obtained by least square fitting of two boundary 
ellipses Em and Em resulting respectively to the dilatation of the optimal parametriza- 
tion 6* by the scale factor v’ =\^JL* and the contraction of the optimal parametrization 
by I/” =U!L* : 



Em inin0(Xj,i/'6»* ; A,Xo) 

A,Xo 



( 17 ) 





Ellipse Fitting with Uncertainty and Fuzzy Decision Stage for Detection 



439 



(ideal) 


X 

0 

II 

0 

0 


0 

II 

0 

0 


II 


A 2 — 

30 


II 

0 


NES 


CLSF 


98.98 


99.11 


60.07 


28.56 


43.87 


0 


cpu:5.5s 


±5.04 


±4.27 


±4.88 


±3.71 


±3.85 




Fitzigbon 


81.38 


100.15 


46.28 


16.04 


35.46 


0 


cpu:3.10“®s 


±2.92 


±1.65 


±2.13 


±2.69 


±2.37 




Bookstein 


72.61 


102.37 


56.73 


9.20 


32.57 


18 


cpu:2.10“®s 


±6.07 


±7.97 


±24.76 


±2.03 


±6.07 




Gander 


175.40 


84.96 


137.24 


47.60 


41.86 


0 


cpu:4.4s 


±136.94 


±45.94 


±139.78 


±34.09 


±34.68 





Table 1. Comparison of the average values of the ellipse parameters estimated by different ap- 
proaches (high eccentricity section). 



Em ^ mme{Xi,v" e*-, A,Xo) (18) 

A,Xo 

In order to observe the behaviour of this envelope according to the arc length, we 
have generated elliptic sections of increasing length. The experimental procedure was 
as follow : - We have considered the case of high eccentricity section. - The arc length 
was varied from 15 to 80% of the ideal total arc length in step of 10%. - These sections 
were corrupted with a zero mean Gaussian noise. We experiment a high noise level 
corresponding to standard deviation of 2 pixels. - The analyzing interval [L^, Lj] is 
fixed as previously. The dimensions of the reference and ideal ellipses are dehned by 
the pair (Aj=70, A2=40) and (Ai=60, A2=30) respectively. - The solution is obtained by 
the algorithm of the section 2.2. The envelope is deduced from the relations (17) and 
(18). The figure 3 presents the results of our htting (’- * -’) and its conhdence envelope 
(grey aeras) estimated from two arc lengths (black noised sections)corresponding to 
15% (left) and 42% (middle) of the total arc length of the ideal ellipse (solid line). For 
comparaison, we have added the ellipse estimated by the Fitzgibbon’s procedure (’- o 
-’). As we can note in these two examples, the envelopes give us an optimal bound of 
the ideal solution even in the case of a very short section. However, in a very noised 
context (a >1.5), the arc length between 10 and 30% of the total arc length may intro- 
duce a bias in the orientation of the envelope. So, we may ’’lose” the ideal solution. The 
figure 3.C illustrates the evolution of the estimates (Ai, A 2 ) (see the curves ’- * -’ on 
figure 4) and their corresponding envelopes (grey aeras). We observe these parameters 
in the whole interval, e.g. from 15 up to 80% of the total arc length. Some remarks can 
be deduced : 

- The increase in the noise level decrease the regularity of the envelope’s evolution. 
However, the bound on the ideal solution is always assured. 

- More the arc length increases, more the hound of the ideal solution is optimized. 

- Beyond that 50% of the total arc length, the width of the envelope plateaus and the 
estimate (’- * -’) converge to the ideal solution (the effects of the arc length constraint 
is reduced). 

The same observations may be expressed for the low eccentricity section. So, the esti- 
mation of an analysing envelope offers several interests: To make the search of further 
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ellipse data easier and then refine the fitting process (see next section) - To reduce thus 
the time of computation in the detection procedure. 






Fig. 3. Illustrations of the confidence envelopes (grey aeras) estimated on two arc lengths (a) and 
(b). Evolution of the size of the confidence envelopes (grey aeras) estimated on increasing arc 
lengths (c). 




Fig. 4. Segments selected by an envelope (grey aera) (a). Fuzzyfication of the fitting measures 
(b). Strategy of decision (c). 



3 Decision scheme with fuzzy rules 

So as to reduce the fitting process, we consider that the contours are classified in terms 
of ’’straight” and ’’curved” segments. This organisation step need a high level descrip- 
tion including some elementary steps such as partitioning, grouping and classification. 
Then, the estimation of the envelopes is initialized on the ’’curved” segments. We must 
now refine the fitting process with the segments selected hy the envelopes (fig. 4. a). 
However, several aspects must be considered : first, among the candidate edges selected 
in the analyzing envelope, some of these segments do not necessary belong to an el- 
liptic boundary. And moreover, the space of shape in images is not only reduced to 
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ellipses. Some ’’parasitical” structures with concave boundaries may exist. Then, it is 
necessary to exploit the unconnected neigbourhood in the fitting test and also clear up 
some ambiguities in the decision step. The incertainty of the data must be considered 
in the decision step in order to weight the result with a confidence degree. To minimize 
the wrong detections, we develop a decision scheme of the fitting results integrating 
four measures : the variance of the error of fit e, the ratio between the total length of the 
segments used in the fitting stage with the perimeter of the fitted ellipse r, and the two 
lengths of the semi-main axes (Ai,A2). The measure e represents a distance between 
the fitted ellipse and the data. More this distance is small, more the model ’’fit” with the 
data. However, this measure is prone to noise in image that is characterized by irregu- 
lar contours. Another factor biases the estimation : the shape are not perfectly elliptic. 
The incertainty attached to this quantity doesn’t allow to specify exactly a threshold be- 
yond which the data doesn’t belong anymore to the ellipse. The parameter r weights the 
membership of an ellipse according to the quantity of data used in the fitting stage. In 
the same way, it is difficult to specify exactly the defined intervals for Ai and A 2 . Then, 
because of the incertain nature of the data, we propose to adopt a fuzzy representation 
of these measures. Every variable is described by a set of linguistic values representing 
by trapezoidal fuzzy sets. The support and the core of every set are determined in an 
experimental way. We also define three fuzzy controllers that provide the different rules 
between the input and the output (fig. 4.b). All these variables are represented by tri- 
angular and uniformly distributed fuzzy sets forming a strict fuzzy partition. The fuzzy 
controllers achieve sl fuzzyfication ipx of the input, a vectorial fuzzyfication (pxxY 
following with an inference operation p. In your application, the T-norm is the min op- 
eration and the T-conorm is the max. The controller C_RL made a fuzzy representation 
of the fitting with the measures e and r whereas the controller C_AB provide a fuzzy 
representation of the lengths Ai and A 2 of the fitting ellipse. The outputs of these con- 
trollers are merging by the controller C_D that guarantees a symbolic description to the 
results. The general framework of decision is made up of the following steps (fig. 4.c) 
: - First an hierarchical and recursive procedure of grouping and fitting is realized with 
the set of the segments Si selected by the confidence envelope - Then, a symbolic de- 
scription is associated with all the candidate ellipses generated previously. This fuzzy 
representation merges the fitting measures and provides to the result a degree of uncer- 
tainty - Lastly, a decision step extracts in all these candidates the most certain grouping. 
The decision rule retains the grouping which have a fuzzy representation belonging to 
some specific fuzzy sets and a smaller degree of specifity in these fuzzy sets. 



4 Results 

We have implemented our automatic extraction scheme to detect microscopic pathogenic 
mushrooms on a wheat leaf. As we can note on the figure 5. a, these cells, examined un- 
der an optical microscope, have shapes close to the ellipse (magnification 40). During 
its evolution, the dimensions of the mushroom lie between 10 and 15/rm for the width 
(A 2 ) and between 30 and 40 pm for the length (Ai). These dimensional characteristics 
help us to initialize the different constraints of the procedure. But, we can constat also 
the difficulty of these images : First, the images are noised and blurred, and the light 
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Fig. 5. Main steps of the detection procedure(see text). 




Fig. 6. Other examples. Results of the detection (top). All the candidate ellipses generated by the 
fitting combination stage (bottom). 



is not homogeneous. Another constraints also can disturb the detection : most of the 
time mushrooms are grouped and overlapped and, parasitical structures (nervures, ger- 
mination...) are present. The figure 5.b shows the result of the contour classification. 
Then, the confidence envelopes (gray aeras) are estimated from the significant ’’curved” 
segments (> 20 pixels)(see fig. 5.c). The figure 5.d shows all the candidate ellipses gen- 
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erated in the grouping and fitting stage. The final result of the decision stage is given on 
the figure 5.e. The degree of uncertainty is not illustrated. The another results presented 
in figure 6 confirm the robustness of our algorithm against noise and overlapping. 

5 Discussion and conclusions 

This article raises the difficult problem of the uncertainty handling in the ellipse fit- 
ting/detection process in complicated image data containing several overlapping and 
occluding elliptic shapes. First, we propose an iterative parametric fitting method inco- 
porating implicitly the dimensional features of the image primitives. We estimate then a 
confidence envelope that gives information about the dimensional uncertainties of these 
shapes. A fuzzy decision step completes the detection procedure and gives a confidence 
degree to the results, it is a new reasoning in the framework of the ellipse detection, be- 
cause most of the approaches gives hit or miss results. The generalization of our method 
to ellipses with different sizes is possible if we properly widen the analysing interval in 
the fitting step and ignore the controller C_AB in the decision scheme. 
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Abstract. As is clear to any user of software, quality control of software has 
not reached the same levels of sophistication as it has with traditional 
manufacturing. In this paper we argue that this is because insufficient thought is 
being given to the methods of reasoning under uncertainty that are appropriate 
to this domain. We then describe how we have built a large-scale Bayesian 
network to overcome the difficulties that have so far been met in software 
quality control. This exploits a number of recent advances in tool support for 
constructing large networks. We end the paper by describing how the network 
was validated and illustrate the range of reasoning styles that can be modelled 
with this tool. 



1 Introduction 

Quality control for mechanical and electronic devices is now a well-developed 
science. However, as all computer users will be aware, the same is not true of quality 
control for software. There are a number of basic reasons for this. Firstly, the 
prerequisites for statistical process control, large sample sizes and repeatable 
processes, are not applicable to software development. Secondly, software failures are 
the result of design faults and not due to ageing of the software product. 
Consequently, very different techniques for quality control are needed for software 
development as compared to traditional manufacturing. 

In this paper, we will discuss some of the difficulties with quality control approaches 
that have been tried so far. We will then identify a number of requirements that need 
to be satisfied by more robust models for software quality assessment and control. We 
will then describe the large-scale Bayesian network that we have built to meet these 
requirements, and discuss its validation using a number of real-world projects. 

2 The Problem with Regression Models 

In this section we will look at the general issues relating to quality control and 
assessment in software development. In subsequent sections we will primarily be 
focusing on software defect modelling. However, it is worth phrasing the problem in 
general terms to emphasise that the longer-term goal is to apply probabilistic 
graphical models to other quality characteristics, like reliability and safety [3,7]. 
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There are two different viewpoints of software quality as defined by Fenton and 
Pfleeger [2]. The first, the external product view, looks at the characteristics and sub- 
characteristics that make up the user’s perception of quality in the final product - this 
is often called quality-in-use. Quality-in-use is determined by measuring external 
properties of the software, and hence can only be measured once the software product 
is complete. For instance quality here might be defined as freedom from defects or the 
probability of executing the product, failure free, for a defined period. 

The second viewpoint, the internal product view, involves criteria that can be used to 
control the quality of the software as it is being produced and that can form early 
predictors of external product quality. Good development processes and well- 
qualified staff working on a defined specification are just some of the pre-requisites 
for producing a defect free product. If we can ensure that the process conditions are 
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Fig. 1: A hypothetical plot of pre-release against post-release 
defects for a range of modules. Each dot represents a module. 

right, and can check intermediate products to ensure this is so, then we can perhaps 
produce high quality products in a repeatable fashion. 

Unfortunately the relationship between the quality of the development processes 
applied and the resulting quality of the end products is not deterministic. Software 
development is a profoundly intellectual and creative design activity with vast scope 
for error and for differences in interpretation and understanding of requirements. The 
application of even seemingly straightforward rules and procedures can result in 
highly variable practices by individual software developers. Under these 
circumstances the relationships between internal and external quality are uncertain. 
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Typically informal assessments of critical factors will be used during software 

development to assess whether the end product is likely to meet requirements: 

• Complexity measures: A complex product may indicate problems in the 
understanding of the actual problem being solved. It may also show that the 
product is too complex to be easily understood, de-bugged and maintained. 

• Process maturity: Development processes that are chaotic and rely on the heroic 
efforts of individuals can be said to lack maturity and will be less likely to 
produce quality products, repeatedly. 

• Test results: Testing products against the original requirements can give some 
indication of whether they are defective or not. However the results of the testing 
are likely only to be as trustworthy as the quality of the testing done. 
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Fig. 2: Actual plot of pre-release against post-release defects for a 
range of modules. 

The above types of evidence are often collected in a piecemeal fashion and used to 
inform the project or quality manager about the quality of the final product. However 
there is often no formal attempt, in practice, to combine these evidences together into 
a single quality model. 

A holy grail of software quality control could be the identification of one simple 
internal product measurement that provides an advanced warning of whether or not 
the goals for the external product characteristics will be achieved. Unfortunately, in 
software engineering the causal relationships between internal and external quality 
characteristics are rarely straightforward. We will illustrate this with one simple 
example. More detailed analyses of naive regression models for software engineering 
can be found in [3], and [4]. 
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Suppose we have a product that has been developed using a set of software modules. 
A certain number of defects will have been found in each of the software modules 
during testing. Perhaps we might assume that those modules that have the highest 
number of defects during testing would have the highest risk of causing a failure once 
the product was in operation? That is, we might expect to see a relationship similar to 
that shown in figure 1 . 

What actually happens? It is hard to be categorical. However, two published studies 
indicate quite the opposite effect - those modules that were most problematic pre- 
release had the least number of faults associated with them post-release. Indeed, many 
of the modules with a high number of defects pre-release showed zero defects post- 
release. This effect was first demonstrated by [1], and replicated by [4]. Figure 2 is an 
example of the sort of results they both obtained. 

So, how can this be? The simple answer is that faults found pre-release gives 
absolutely no indication of the level of residual faults unless the prediction is 
moderated by some measure of test effectiveness. In both of the studies referenced, 
those modules with the highest number of defects pre-release had had all their defects 
“tested out”. In contrast, many of the modules that had few defects recorded against 
them pre-release clearly turned out to have been poorly tested - they were significant 
sources of problems in the final implemented system. 

3 The Need for Causal Modelling 

The fundamental difficulty with the use of naive regression models for software 
quality assessment is that although they may be used to explain a data set obtained in 
a specific context, they cannot be used to manage a software development process. 
For this, we need to identify the causal influences on the attribute we are interested in. 
The example from the preceding section was a case in point. We cannot make 
management decisions about the quality of software from defect data alone. We must 
also take into account, at least, the effectiveness with which the software has been 
tested. 




Fig. 3: A simple graphical model that provides greater explanatory power 
than a naive regression model. 

Figure 3 provides a slightly more comprehensive model. “Defects Present”, is the 
attribute we are interested in. This will have a causal influence on the number of 
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“Defects Detected” during testing. “Test Effectiveness” will also have a causal 
influence on the number of defects detected and fixed. As we will see later, this will 
turn out to he a fragment of a much larger model, with the node representing defects 
present being a synthesis of a number of factors including, for example, review 
effectiveness, developer’s skill level, quality of input specifications and resource 
availability. 

4 A Probabilistic Model for Software Defect Prediction 

These discussions lead us naturally to considering the use of Bayesian networks for 
quality control in software development. They have many advantages: 

1 . They can easily model causal influences between variables in a specified domain; 

2. The Bayesian approach enables statistical inference to be augmented by expert 
judgement in those areas of a problem domain where empirical data is sparse; 

3. As a result of the above, it is possible to include variables in a software reliability 
model that correspond to process as well as product attributes; 

4. Assigning probabilities to reliability predictions means that sound decision 
making approaches using classical decision theory can be supported. 

We have built a module level defect estimation model, and evaluated it against real 
project data. Since this was a research activity, resources were not available to 
perform extensive knowledge elicitation with the active and direct involvement of 
members of Philips’ Lines of Businesses (LoBs). Philips Research Laboratory’s 
experience from working directly with LoBs was used as a surrogate for this. 
Although this meant that the probabilistic network could be built within a relatively 
short period of time, the fact that the probability tables were in effect built from 
“rough” information sources and strengths of relations necessarily limits the precision 
of the model. However, as will be seen, the resulting model has proven to be quite 
accurate. 

4.1 Overall Structure of the Bayesian Network 

The probabilistic network is executed using the generic probabilistic inference engine 
Hugin (see http://www.hugin.com for further details). However, the size and 
complexity of the network were such that it was not realistic to attempt to build the 
network directly using the Hugin tool. Two of the authors (Fenton and Neil) have 
been actively developing tools and techniques to assist with the development of large- 
scale Bayesian networks [7]. As a result we were able to use two methods and tools, 
built on top of Hugin, to tackle effectively this otherwise intractable task: 

• The SERENE method and tool [8], which enables: large networks to be built up 
from smaller ones in a modular fashion; and, large probability tables to be built 
using pre-defined mathematical functions and probability distributions. 

• The IMPRESS method and tool [6], which extends the SERENE tool by enabling 
users to generate complex probability distributions simply by drawing 
distribution shapes in a visual editor. 

The resulting network takes account of a range of product and process factors from 
throughout the lifecycle of a software module. Because of the size of the model, it is 
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Fig. 4: Overall network structure. 

impractical to display it in a single figure. Instead, we provide a first schematic view 
in terms of sub-nets (Figure 4). This modular structure is the actual decomposition 
that was used to build the network using the SERENE tool. 

The arc labels in Eigure 4 represent ‘joined’ nodes in the underlying sub-nets. This 
means that information about the variables representing these joined nodes is passed 
directly between sub-nets. Eor example, the specification quality and the defect 
density sub-nets are joined by an arc labelled ‘Module size’. This node is common to 
both sub-nets. As a result, information about the module size arising from the 
specification quality sub-net is passed directly to the defect density sub-net. We refer 
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to ‘Module size’ as an ‘output node’ for the specification quality sub-net, and an 
‘input node’ for the defect density sub-net. 

4.2 Some Comments on the Details of the Network 

There is insufficient space to describe the details of all the sub-nets. However, we 
show the Specification quality sub-net in Figure 5 as an example. This can be 
explained as follows. Specification quality is influenced by three major factors: 

• intrinsic complexity of the module (that is, the inherent complexity of the 
problem to be solved); 

• the internal resources used, which are in turn factored into staff quality (or 
experience), the input document quality, and the schedule constraints; 

• the stability of the requirements, which are in turn dependent on the extent of 
stakeholder involvement and the novelty of the problem. 




Fig. 5: Specification quality sub-net. A dashed border indicates a node that is shared with 
another sub-net. 

The complete network models the entire development and testing life-cycle of a 
typical software module. We believe it contains all the critical causal factors at an 
appropriate level of granularity, at least within the context of software development 
within Philips. It contains 65 nodes, many with 3-5 ordinal values, but several having 
continuous scales. 
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The node probability tables (NPTs) were built by eliciting probability distributions 
based on experience from within Philips. Some of these were based on historical 
records, others on subjective judgements. For most of the non-leaf nodes of the 
network the NPT s were too large to elicit all of the relevant probability distributions 
using expert judgement. Hence we used the novel techniques that have been 
developed recently on the SERENE and IMPRESS projects [6, 8, 9], to extrapolate all 
the distributions based on a small number of samples. 

Consider, for example, the node for specification quality in Figure 5. This has three 
parent nodes, two with 5 values and one with 4. Consequently, for each value of 
specification quality we need to define 100 probabilities. Instead of eliciting these 
directly, we elicit a sample of distributions (including ‘extreme’ values) and then 
extrapolate distributions for all intermediate values. 

By applying numerous consistency checks we believe that the resulting NPTs are a 
fair representation of experience within Philips. 

As it stands, the network can be used to provide a range of predictions and “what-if ’ 
analyses at any stage during software development and testing. It can be used both for 
quality control and process improvement. However, two further areas of work were 
needed before the tool could be considered ready for extended trials. Firstly and most 
importantly, the network needed to be validated using real-world data. Secondly a 
more user-friendly interface needed to be engineered so that (a) the tool did not 
require users to have experience with probabilistic modelling techniques, and (b) a 
wider range of reporting functions could be provided. The validation exercise will be 
described in the next section in a way that illustrates how the probabilistic network 
was packaged to form the AID tool (AID for “Assess, Improve, Decide”). 

5 Validation of the Defect Estimation Tool AID 

The Philips Software Centre (PSC), Bangalore, India, made validation data available. 
We gratefully acknowledge their support in this way. PSC is a centre for excellence 
for software development within Philips, and so data was available from a wide 
diversity of projects from the various Business Divisions within PSC. 

Data was collected from 31 projects from three Business Divisions: Mainstream 
Consumer Electronics, Philips Medical Systems and Digital Networks. This gave a 
spread of different sizes and types of projects. Data was collected from three sources: 

• Pre-release and post-release defect data was collected from the “Performance 
Indicators” database. 

• More extensive project data was available from the Project Database. 

• Completed questionnaires on selected projects. 

In addition, the network was demonstrated in detail on a one to one basis to five 
experienced quality/test engineers to obtain their reaction to its behaviour under a 
number of hypothetical scenarios. 

The network was used to make predictions of numbers of defects found during unit 
test, integration test and independent testing of the module once it had been integrated 
into a product. These predictions were compared against the actual values obtained. 
Full data was only available from ten of the projects (that is, including data from 
independent testing). Consequently, insufficient data was available to perform a full 
statistical validation. However, with the exception of two projects the median value 
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for the number of defects predicted at each testing phase was always within 10% of 
the actual value. The two exceptional projects involved significant elements of user 
interface design. These sorts of projects typically generate large numbers of change 
requests as the details of the UI design are clarified, so we feel that they fall outside of 
the scope of our model as it currently stands. 

One of the major values of AID is as a tool for exploring the possible consequences of 
changes to a software process, or the constraints on a product’s development. The 
ability of Bayesian networks to handle quite complex reasoning patterns is one of the 
reasons why the tool is proving so successful in this regard. We end with one 
example, which also illustrated how our model does handle the sort of effects that 
were discussed in Section 2. 

Table 1 lists the median values of “Defects found at Unit Test” and “Defects 
Delivered” for a variety of values for the intrinsic problem complexity of the software 
module under development. Look at the first row first; the predictions for the number 
of defects found during unit test. For a very simple module, we get an increase in the 
number of defects found over the prior, and a decrease for a very complex module. 





Intrinsic Complexity of the Software Module 




Prior 


“Very Simple” 


“Very Complex” 


Defects found in 
Unit Test 


90 


125 


30 


Defects delivered 


50 


30 


70 



At first sight, this seems counter intuitive - we might expect simpler modules to be 
more reliable. The explanation is that the more complex modules are harder to test 
than the simpler modules. With their greater ability to “hide” faults, fewer faults will 
be detected unless there is a compensating increase in the effectiveness with which 
the module is tested. No such compensation has been applied in this case and the low 
prediction for defects detected and fixed for the “very complex” case indicates that 
typically such modules are relatively poorly tested. 

This is borne out when we look at the respective figures for residual defects delivered, 
in the second row of the table. Now we see a reversal. The prediction for the “very 
complex” module indicates that it will contain more residual defects than the “very 
simple” module (a median of 70, compared to a median of 30). So our model 
naturally produces the qualitative behaviour of the real world data from our earlier 
experiment. That is, the better-tested modules yield more defects during unit test and 
deliver fewer defects. For the more poorly tested modules, the converse is the case. 
(Note that the table misses out data from the Integration and Independent Test Phases. 
When this is included the total number of defects - found plus delivered - is greatest 
for the “Very Complex” module). 

6 Conclusion 



We started with an introduction to the problem of quality control in software 
development. Our hypothesis was that this was a domain where use of state-of-the-art 
techniques for reasoning under uncertainty could provide significant added value. We 
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have ended up with a remarkably accurate software defect prediction model that also 
has tremendous potential for exploring the consequences of diverse software 
development scenarios. 

Due to space limitations, we have had to strike a balance between focussing on 
discussing the motivation for using advanced techniques for reasoning under 
uncertainty in the software engineering domain, and providing detail on our solution 
to this problem. An extended discussion of the method of construction of the network 
and the validation experiments so far performed can be found in [5], which can be 
obtained from any of the authors. 
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Abstract. The present paper deals with spatial information revision in 
geographical information system (GIS). These systems use incomplete 
and uncertain information and inconsistency can result, therefore the 
definition of revision operations is required. Most of the proposed belief 
revision operations are characterized by a high complexity and since GIS 
nse large amount of data, adjustments of existing strategies are neces- 
sary. Taking advantage of the specificity of spatial information allows to 
define heuristics which speed up the general algorithms. We illustrate 
some suitable adjustments on 3 approaches of revision: binary decision 
diagrams, preferred models and Reiter’s algorithm for diagnostic. We 
formally compare them and we experiment them on a real application. 
In order to deal with huge amount of data we propose a divide and revise 
strategy in the case where inconsistencies are local. 



1 Introduction 

Geographic information systems (GIS) deal with incomplete and uncertain in- 
formation. Since the data come from different sources characterized by various 
data qualities, these data may conflict and require belief revision operations. 

In knowledge representation for artificial intelligence, one tries to represent 
a rational agent perceptions and beliefs. Since, most of the time, the agent faces 
incomplete, uncertain and inaccurate information, he needs a revision operation 
in order to manage his beliefs change in presence of a new item of information. 
The agent’s epistemic state represents his reasoning process and belief revision 
consists in modifying his initial epistemic state in order to maintain consistency, 
while keeping new information and removing the least possible previous infor- 
mation. Most of the logical approaches have been developped at the theoretical 
level, except for a few applications [15] and it turns out that in the proposition- 
nal case the theoretical complexity of revision is 7T| [6] [9] . In other respects, we 
deal with geographic information characterized by a huge amount of data and at 
first glance it seems to be no hope of performing revision in the context of GIS. 
However, we show in this paper that it is possible to identify a tractable class 
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of problems for which revision can be performed with reasonable complexity. 
Effective implementation of revision operations requires a suitable ajustment of 
existing strategies and we have to take advantage of the spatial knowledge rep- 
resentation in order to define heuristics which speed up the general algorithms. 

In this paper we propose three different approaches of revision in the con- 
text of GIS. The first one stems from the ROBDD^, the second one is based on 
preferred models computation and the third one is an adaptation of Reiter’s algo- 
rithm for diagnosis. We show that these three approaches formally give equivalent 
results and that the three defined revision operations verify KM postulates. We 
then conduct an experimental comparison. This experimental study is achieved 
on a real application developped by the CEMAGREF^ about the flooding of the 
Herault river (France) in order to provide a better understanding of flooding 
phenomenon [11]. The aera is segmented into compartments, a compartment is 
a spatial entity in which the water height is considered to be constant. We first 
compare the three approaches on the maximal number of compartments they 
can deal with in reasonable time. Since the large size of the data, we then de- 
fine divide and revise strategy and propose an algorithm in the case where the 
inconsistencies are supposed to be local. 

The paper is organized as follows. After some preliminaries in Sect. 2, we 
first present in Sect. 3 the three approaches of revision, according to ROBDD, 
according to preferred models and according to Reiter’s algorithm for diagnosis. 
We then show that these three revision operations are formally equivalent and 
satisfy the KM postulates. In Sect. 4 we perform an experimental study and 
describe the divide and revise strategy, before concluding in Sect. 5. 

2 Preliminaries 

In the following we use usual notations for propositional logic. W denotes the 
set of interpretations, we write uj 1= a for specifying that w is a model of a 
propositional formula a and Mod{a) denotes the set of models of a. 

Belief revision has been successfully characterized by Alchourron, Gardenfors 
and Makinson [1] who provided a set of rationality postulates for epistemic states 
which consist in belief sets representing an agent’s current beliefs. Katsuno and 
Mendelzon [7] reformulated these postulates for epistemic states represented by 
a single propositional formula ip where any formula entailed by ip is part of the 
belief set. Let ip, (p and /r be propositional formulas, the postulates are: 

(Rl) Ip o implies /x. 

(R2) If Ip A iJ, is satisflable, then ip o ^ = ip A fi. 

(R3) If fj, is satisflable, then so is x/> o 

(R4) If ipi = ip 2 and = ^ 2 , then i/'i ° Mi = V ’2 ° M 2 - 

(R5) {ip o y) A <p implies ip o {fj, A (p). 

(R6) If {ip o A (p is satisflable, then ip o {^ A (p) implies {ip o A (p. 

^ Reduced Ordered Binary Decision Diagrams. 

^ French research center on water and forest management. 
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A revision operation satisfying the AGM postulates is equivalent to a set of 
total pre-orders between interpretations, of the propositional calculus. Katsuno 
and Mendelzon [7] define a faithful assignment with respect to ip, a function that 
assigns to each formula ip the total pre-order on W, denoted <^, satisfying the 
following properties: (1) If Wi, lu 2 \= ip then wi 102] (2) ii u>\ \= ip and ^>2^= ip 
then uji UJ2', (3) ip =s (piS^ 0- They obtain a representation theorem: 

Theorem 1. A revision operator o satisfies postulates (R 1 )~(R 6 ) iff there exists 
a faithful assignment that maps each formula ip to a total pre-order<^ such that 
Mod{ip o pP) = min (Mod (/i) , ) . 

3 Different Approaches 

As mentionned in the introduction, we want to take advantage of the spatial 
knowledge representation. We consider an area which is segmented into regions, 
measurements on regions can be encoded in propositional mono-literal clauses 
(set ^i). Topological relations between regions produce binary clauses (set S^)- 
The fact that spatial entities are part of regions can be represented by domain 
constraints which are encoded in a n-ary clause and n(n— l)/2 mutual exclusion 
binary clauses (set S'f). Let S2 = S2 U S2 ■ We suppose that S\ and S2 are 
consistent and that Si U S2 is inconsistent. Since S2 is more reliable than Si we 
revise by S'2. Our revision strategy stems from the determination of minimal 
subsets of clauses to drop out in order to restore consistency. More formally, 
we generalize the notion of removed set for a set of clauses revised by a set of 
clauses, previously introduced in [10] for a set of clauses revised by a unique 
clause. 

Definition 1. A removed set R for the revision operation S10S2 is the smallest 
subset of clauses to remove from Si such that ((^i U S'2) \ R) is consistent. 



Consequence 1. If R is a removed set for the revision operation Si o S2, then 
Vci G R, ((Si U S2) \R) \= -'Cj. 



Consequence 2. R C Si is a removed set for the revision operation Si o S2, 
iff R is a minimal set (according to cardinality) such that {S2A {Si\ R)) is 
consistent. 

3.1 Revision Using ROBDD 

A BDD (Binary Decision Diagram), [4] represents a boolean function (or formula) 
Ip using a labeled direct acyclic graph. The graph has two sinks vertices labeled 0 
and 1 representing the constant boolean function 0 and 1 respectively. Each non- 
sink vertex is labeled with a boolean variable v and has two out-edges labeled 
then and else. The then child corresponds to the case where v = 1 and the else 
child corresponds to the case where v = 0. Given an order on the variables, an 
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ordered BDD is a BDD with the constraint that all paths from the source to 
a sink visit the variables in an ascending order. A reduced ordered or ROBDD 
is a OBDD which has been transformed by deleting some nodes, it may be 
viewed as a compressed decision tree for a propositional formula ■;/'• Each path 
from the source to a sink represents an interpretation of xp and conversely each 
interpretation of ip corresponds to a path from the source to a sink. A model, 
resp. a countermodel of ip corresponds to a path from the source to the sink 1 
resp.O. 

The construction of a ROBDD associated with the set of clauses S'! US '2 leads 
to ROBDD with all paths to the sink 0 since Ai U S 2 is inconsistent. In order 
to solve this problem, we use the transformation introduced by [8] and [3]. Each 
clause c of Si is replaced by the formula (pc c, where <pc is a new variable. If 
(pc is assigned true then 0c — f c is true iff c is true, this enforces the clause c. On 
contrast, if (pc is assigned false then 0c — f c is true whatever the truth value of 
c, the clause c is not taken into account. More formally: 

Definition 2. Let C be a set of clauses, H (C) denotes the set of clauses re- 
sulting from the transformation described above, and Hq denotes the set of the 
newly introduced variables. We define a mapping a from C to H (C) such that 
Me G C, a (c) = (pc ^ c and a mapping Uyar from H (C) to He such that 

V (0c f d) G H. (O), (7 var (0c f c) — (pc- 

We construct a ROBDD associated with (S' 2 U'H(S'i)) and minimizing the number 
of clauses to remove from S\ amounts to minimize the number of new variables 
(pc assigned false. To each path in the ROBDD we assign a cost as follows: for 
each variable in the cost is 0 for the then branches, it is 1 for else branches 
and it is 0 for all the other variables. The cost of a path p is the sum of the costs 
of the visited branches. Minimizing the number of clauses to remove from S\ 
amounts to find in the ROBDD representing (S '2 U T-L{Si)) a path p from source 
to sink 1 with minimal cost. More formally: 

Definition 3. Let C be a set of clauses and let the ROBDD representing H (C). 
Let p be a path. We define (resp. ) as the set of new variables such 

that p starts from source to sink and visit the (resp. Hq~ ) variables in the 
then branch, (resp. the else branch). 

Theorem 2. Let R C Si. R is a removed set for the revision operation SiObdd 
S 2 iff there exists a complete path in the ROBDD representing (S '2 U % (Si)) to 
the sink 1 with minimal cost and such that = Hgf . 

3.2 Revision Using Preferred Models (MPL) 

[2], [5] provide an algorithm, called MPL, to compute the preferred models of 
a set of propositional formulas. This algorithm stems from the definition of 
a preference relation between models, in order to only compute the preferred 
models. The preference relation can be built on a subset of propositional variables 
and we use it to compute the removed sets. More formally: 
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Definition 4. Let V be a set of propositional variables, lit {T>) denotes the set 
of literals ofV. 

— A partial T> -interpretation IP is a set of non contadictory literals such that 
IP C lit (V). 

— A T> -interpretation is a partial T> -interpretation IP such that \lx G T> either 
X € IP or X ^ IP and Xp denotes the set of T> -interpretations. 

— Let IP be a partial T> -interpretation, Ext{IP,T>) = {I G Xp | IP C /} de- 
notes the set of interpretation which extend IP. 

— Let C be a set of clauses and let I be a T> -interpretation, I is a 'D-model of 
C iff there exists M a model of C such that I C M. 



Definition 5. Let T> be a set of propositional variables, lit (V) denotes the set 
of literals ofV. 

— lit{T>) = P U MV where V is the set of preferred literals and MV is the set 
of non-preferred literals. 

— Let IP\ and IP 2 be two partial T> -interpretations, IP\ is preferred to IP 2 iff 
the set of non-preferred literals of I Pi is included in the set of non-preferred 
literals of IP 2 , this is denoted by IP 2 C IP\. 

The defined order IZ has a maximal element which is the X>-model V and a 
minimal element which is the X>- model MV [2] . 

Proposition 1. Let IP be a partial T> -interpretation, Ext{IP,T>) represents 
the set of extensions of IP in T> and Ext{IP,T>) only has one maximal element 
for \z[5]. 

The Davis and Putnam algorithm enumerates some of ^-interpretations, using 
the preference relation between literals these ^-interpretations are ordered, con- 
sequently the first ^-interpretation satisfying the set of clauses is a preferred 
X>-model, denoted Mp. In order to eliminate the non-preferred X>- model, the 
initial set of clauses is modified by the addition of a clause consisting of the 
negation of all the non-preferred literals of Mp. 

We now show how we adapt the MPL algorithm. As in the previous approach, 
we construct a new set of clauses 'H(S'i) replacing each clause c of Si by the 
formula (fc c. The new variables 4>c of the set Hs^ play the same part as 
previously, they enforce the clause c. A preference relation is defined by the 
preference of literals of the clauses of Hs„ . Minimizing the number of clauses to 
remove from Si amounts to select among the preferred lit{H Si)-models those of 
minimal cardinality. More formally: 

Definition 6. Let L be a finite set of literals. n~ (X) (resp. n+ {L)) denotes the 
set of negative literals (resp. positive literals) of L. 



And the following theorem holds: 
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Theorem 3. Let R Si and MP^ be the set of the preferred Hs^-models 
of (S'2 U "H (S'!)). Mu denotes the Hs^ -interpretation such that Vc G R, 
~'0'var (o' (c)) € and Vc G \ R, cr^ar (o' (c)) G Mfl. R is a removed set 
for the revision operation S\ o^pl S2 iff Mp G MP and M' G MP such that 
M' ^ Mr n~ (M') > n~ (Mr). 

3.3 Revision Using Reiter’s Algorithm (REM) 

In [17] Wurbel et al. presented a detailed description of the adaptation of Reiter’s 
algorithm for diagnosis [12]. In order to perform the comparison with the two 
previous revision operations, we briefly recall the approach. 

Definition 7. Let F be a collection of sets. A hitting set of F is a set H C IJseF 
such that VS' G F, H C\ S ^ 9 . 

H is a minimal hitting set of F iff H is a hitting set of F andVH' C FI with 
H' yf 0, H' is not a hitting set of F. 

We denote by Jf(F) the set of minimal hitting sets of a collection of sets. Our 
revision strategy is to first compute the minimal hitting sets of the collection of 
inconsistent subsets of Si U S2 denoted by F(Si U S2), using an adaptation of 
Reiter’s algorithm [12,14] and to then order the minimal hitting sets in order to 
only select one of them. 

Proposition 2. There exists N £ (I (Si U S2)) such that A 0 S2 = 0. 

We first establish the correspondance between minimal hitting sets and removed 
sets as follows: 

Theorem 4. Let R C Si, R is a removed set for the revision operation SiOrem 
S2 iff R is a minimal hitting set of minimal cardinality for the collection of 
inconsistent subsets of Si U S2 and i? fl S2 = 0. 

We recall the adaptation of Reiter’s algorithm in order to compute the minimal 
hitting sets. 

REM algorithm. Computation of A/g, (I(SiUS2)). Let I(SiUS2) the collection of 
the inconsistent subsets of Si US2. The tree of the construction of the collection 
of minimal hitting sets, denoted T, is the smallest tree satisfying the following 
properties: 

— its root is labeled by “\/” if X(Si U S2) = 0, otherwise its root is labeled by 
an element of T(Si U S2). 

^ MP is obtained using the MPL algorithm with the set of clauses (S 2 U H (Si)) and 
the set of preferred literals lit{Hsf). 
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— if n is a node in T we define H (n) as the set of branches labels on the path 
from the root to n. If n is labeled by “-v/”, it does not have any successor 
node in T. If n is labeled by a set if G 2 i(S'i U S2), then for each a £ S 
such as CT G S'! (according to Prop. 2) n has a successor node Ucr linked to 
n by a branch labeled by a. is labeled by a set S' G I{Si U S2) such that 
S n H{ria) = 0 if such a set S exists, if there is no such S, Ua is labeled by 

V”- 



Consequence 3. The maximum depth of the computing tree of minimal hitting 
sets ofAfsi (>^1 >^ 2 )) is #Si. 

We then provide a refinement of the REM algorithm, denoted by REM/j, 
which only computes the minimal hitting sets of minimal cardinality. This re- 
finement stems from the following. In the tree construction, as soon as we find a 
minimal hitting set, we only continue the tree construction in breadth. Because 
if we continue the construction in depth we get a minimal hitting set which is 
not of minimal cardinality. 



3.4 Properties of the Revision Operations 

Proposition 3. The revision operations obdd, °mpl and orem give the same 
results. 

The proof follows from theorems 2, 3 and 4[16]. We then show that the defined 
revision operations satisfy the KM postulates. We first define a total pre-order 
corresponding to a propositional formula as follows: 

Definition 8. Let ip be a formula, in conjunctive normal form ( CNF). Let to G 
W, (w) denotes the set of clauses appearing in the CNF ip which are falsified 
by oj and fpNS.,p (w) the number of such clauses. We define the total pre-order 
<.0 by: Vwi, W2 G W, <.0 W2 iff #NS.,p (wi) <0 #NS,p (W2). 



Proposition 4. The function that assigns each formula ip to the total pre-order 
< 0 , defined in definition 8 is a faithful assignment. 

And the following theorem holds. 

Theorem 5. The revision operation orrm satisfies the KM postulates (Rl)~ 
(R6) and Mod {ip orem t) = Min {Mod {fT ) , < 0 ). 

Since, by Prop. 3 the orobdd and Of^pR operations give the same results than 
Orem, they satisfy the KM postulates and theorem 5. 
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3.5 Comparison 

The advantage of using the ROBDD lies in the good complexity in time in 
consistency checking. In our problem, using ROBDD amounts to look for the 
shortest path from the source to the sink 1 . The drawback is that the incremental 
construction of a ROBDD can lead to a transitory binary decision diagram 
exponential in memory size. Moreover, we have to fix some parameters in order 
to provide a good ordering on the variables. The MPL and REM algorithms do 
not require this preliminary ordering on variables. Anyway, the REM/j algorithm 
gives best results since it uses a breadth first tree construction, and has the 
property to be “anytime” which allows to re-use some results before the end of 
the revision process. 

4 Experimental Comparison 

We now present the experimental study of a river flooding with the aim of 
assessing the water level. 

4.1 The Application 

The tests are conducted on data provided by CEMAGREF. The area is a valley 
segmented into 200 compartments. Aerial pictures provide the first source of 
information denoted by S 2 , consisting of two kinds of hydraulic relations between 
adjacents compartments. A flow relation reflects the presence of a hydraulic link 
between two compartments with visible water flow. A hydrodynamic balance 
relation reflects the presence of a hydraulic link between two compartments but 
no visible flow. These relations are translated into constraints. Each constraint 
is encoded in propositional calculus by negative binary clauses representing the 
forbidden values of the constraint and the description of the variables is encoded 
by a n— ary clause for a domain of size n and n(n — 1) /2 negative binary mutually 
exclusive clauses. The second source of information comes from land agricultural 
use, denoted by Si, and consists of estimations on minimal and/or maximal 
submersion heights. These estimations are translated into a set of equalities. 
Each equality is encoded by means of monoliteral clauses. For an area of 200 
compartments we deal with 37700 clauses and 3200 propositional variables. 

The experimentation has been performed on a PC equipped with a Pentium 
II 233Mhz processor and I28Mo RAM. The algorithms have been implemented 
with C language and the egcs 1.0.2. compiler with -02 optimizations activated. 
For the ROBDD approach the CUDD library has been used. 

Since the ROBDD is very sensitive to the variables ordering, we first perform 
checks in order to provide the best variables ordering for our problem. In order to 
perform further comparison we then adopt the following methodology. We check 
the limits of each of the proposed algorithms, dealing with a certain number of 
compartments, and increasing this number as far as possible. When the limits 
are reached we propose “a divise and revise” strategy. We divide the problem 
into subproblems we solve them and we merge the solutions. 
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4.2 Experimental Results 

We deal with an increasing number of compartments from three to twelve com- 
partments. The CPU times shown in figure 1 only apply on 8 compartments 
because for a greater number of compartments some algorithms stop either for a 
limit in space (ROBDD, REM) or for a limit in CPU time (MPL). A refinement 
of REM algorithm denoted by REM/{ which only computes the minimal hitting 
sets of minimal cardinality, gives much better results than the other algorithms, 
but it is not illustrated in the figure for the sake of legibility. 



I 




Fig. 1. Performance of the different approaches 



As illustrated in the figures 2 (a) and (b), although the ROBDD, MPL and 
REM algorithms formally give equivalent results, the experimentation shows 
that the REM algorithm is faster than the others. Using the REM/j algorithm 
it is also possible to deal with twelve compartments in a reasonable time, which 
is not possible with the other algorithms. 





(a) performance gain for REMb aJ- 
gorithm 



(b) performance of the REMi? algo- 
rithm 



Fig. 2. Performance gain for REM_r algorithm 
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4.3 Divide and Revise 

The experimental results show that it seems difficult, even using the REM/j 
algorithm, to deal with the 200 compartments at one and the same time. The 
experimentation underlines the fact that the untractable group of compartments 
corresponds to those which have numerous minimal hitting sets. Moreover, it 
turns out that the inconsistencies are not global but spatially localized. There- 
fore we divide the problem into subproblems with a reasonable number of com- 
partments in order to solve the local inconsistencies. However several questions 
arise, how to partition the set of compartments, how to be sure that solving the 
subproblems and merging the solutions would restore the consistency of the ini- 
tial problem, and how to perform the merging. The third question still is under 
investigation, anyway, in the case of local inconsistencies we propose a solution. 

We divide the whole area, represented by S, into packets of compartments of 
tractable size. The computation of minimal hitting sets of S is achieved from the 
computation of minimal hitting sets of the packets, via REM algorithm, and we 
propose a merging algorithm stemming from the following considerations. Since 
the division of the whole area, is not necessarily a partition, some packets may 
overlap, the determination of minimal hitting sets R for the packets could not 
be sufficient to restore consistency. Therefore, we have to compute the minimal 
hitting sets for each S\R and the minimal hitting sets for S consist of elements 
from R and elements from S\R. More formally, we now present some definitions 
in order to establish the algorithm computing the minimal hitting sets of S 
according to the divide and revise strategy. 

Definition 9 (merging). Let S = SiU S2 be a set of clauses. Let S' = S[U S'2 
and S" = S'{ U S'f be two subsets such that S'l C Si, S'{ C Si, S'2 C S2, 
S'f C S'2. The merging operation U is defined by: {S' U S") = S' U S" U 
{c I c G S, lit{c) n {lit (S') U lit (S")) ^ 0} 



Definition 10. Let Ci and C2 be two collections of sets. The operation 0 is 
defined by: Ci 0 C2 = {S | S = Si U S2, (Si, S2) G Ci x C 2 } 



Definition 11. Let C be a collection of sets and E be a set. The operation • is 
defined by: E • C = {S \ S = E U C, C G C} 



Definition 12. Let Ci and C2 be two collections of sets. The operation V is 
defined by: Ci V C2 = {S | S G Ci U C2 and VS' G Ci U C2, S' ^ S, S (f. S'} 

We now present the algorithm: 

Function divrev {TKB\ TN) 

TKB: vector of [1 ... n] bases of clauses ; 

TN: vector of [1 ... n] collections of minimal hitting sets; 

TKB': vector of [1 . . . n'] bases of clauses, with n' < n; 

TN': vector of [1 . . . n'] collections of minimal hitting sets; 
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LF: vector of [1 . . . n'] tuples of size n giving the TKB to merge in order to 
construct the TKB'; 

TmpC, TmpC' : temporary collections of sets; 

begin 

if n = 1 then 

return (TKBi,TNi); 

end if 

LF := get_merging_list (TKB); 
for all (ii, . . . , im)k G LF, k G do 

TKB'^ := {TKB,, U . . . U TKB,J; 

TN'^ = 0 ; 

for all R £ {TNi, 0 ... 0 TNi^) do 
TiV' := TiV' V [^ • REM {TKB'^, \ i?)]; 

end for 
end for 

return divvev{TKB' , TN')\ 

end 

The completeness of the algorithm stems from the definition of the merging 
operation between subbases U, from the presence in the merged base of minimal 
hitting sets coming from the subbases and from the minimality of the hitting 
sets coming from v operation. 

5 Conclusion 

In this paper, we presented a comparison between three revision approaches 
stemming from a real application in the context of GIS. These approaches could 
be successfully applied to other applications in geophysics, demography, etc. 
Since the adaptation of Reiter’s algorithm to revision gives the best experimen- 
tal results, it could be fruitful to adapt other algorithms designed for diagnosis, 
particularly real time algorithms. In order to deal with the large size data we 
proposed a divise and revise strategy and gave an algorithm in case where the 
inconsistencies are assumed to be local, however how to partition the area is still 
an open question. And coming back from practice to theory, it is finally inter- 
esting to notice, as independently also shown in [13], that maxichoice revision 
operations give good results for the finite case while they are too restrictive for 
the deductively closed case. 
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Abstract. Intelligent agents have to be able to merge inputs received from differ- 
ent sources in a coherent and rational way. Recently, several proposals have been 
made for the merging of structures in which it is possible to encode the preferences 
of sources [5,4,12,13,14,1]. Information merging has much in common with the 
goals of social choice theory: to define operations reflecting the preferences of 
a society from the individual preferences of the members of the society. Given 
this connection it seems reasonable to require that any framework for the merg- 
ing of information has to provide satisfactory ways of dealing with the problems 
raised in social choice theory. In this paper we investigate the link between the 
merging of epistemic states and two important results in social choice theory. We 
show that Arrow’s well-known impossibility theorem [2] can be circumvented 
when the preferences of sources are represented in terms of epistemic states. This 
is achieved by providing a consistent set of properties for merging from which 
Arrow-like properties can be derived. We extend this to a consistent framework 
which includes properties corresponding to the notion of being strategy-proof. The 
existence of such an extended framework can be seen as a circumvention of the 
impossibility result of Gibbard and Satterthwaite [8,17,18] and related results [6, 

3]. 

1 Introduction 

Intelligent agents have to be able to merge inputs received from different sources in a 
coherent and rational way. Recently, several proposals have been made for the merging 
of structures in which it is possible to encode the preferences of sources. In [5,4] infor- 
mation fusion is described in terms of possibility distributions [7] and the K-framework 
developed in [21]. In [13,14], information merging is described in terms of epistemic 
states; structures in the style of [20]. In [1] the combination of preferences is described 
in a framework where preferences are represented as arbitrary binary relations. 

It has been pointed out that the merging of information is similar to the operations 
studied in social choice theory, where the aim is to provide fair and equitable methods 
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for aggregating the preferences of the members of a society to produce a single relation 
reflecting the preferences of society [10,1 1]. It thus seems reasonable to expect any pro- 
posed framework for the merging of information to be able to deal satisfactorily with 
the problems raised in social choice theory. In this paper we investigate the link between 
the merging of epistemic states and two important impossibility results in social theory: 
the Arrow impossibility theorem and the Gibbard-Satterthwaite impossibility theorem. 
Arrow showed that there is no aggregation operation satisfying certain reasonable pos- 
tulates [2]. We show that the Arrow result can be circumvented when preferences are 
represented in terms of epistemic states. Informally, epistemic states assign ranks to 
the valuations, or possible worlds, of the logic under consideration. We provide a list 
of properties to be satisfied by all rational merging operations and prove that the Ar- 
row postulates, suitably modified to apply to this framework, can be derived from these 
properties. We show that these properties are consistent, thereby providing a circum- 
vention of Arrow’s result. Gibbard [8] and Satterthwaite [17,18] independently proved 
that, under certain conditions, every reasonable method to aggregate the preferences 
of members of a society is vulnerable to manipulation by the members of that society. 
One of the Gibbard-Satterthwaite conditions, single-valuedness, is quite restrictive, but 
similar impossibility results hold even in its absence [6,3]. We extend our framework 
for the merging of epistemic states by adding properties which disallow various forms 
of manipulation. In particular, we propose properties which force merging operations to 
be strategy-proof. The proof that the addition of these properties results in a consistent 
extension of the basic framework for merging can be seen as a circumvention of the 
Gibbard-Satterthwaite theorem and related results [6,3]. 

We assume a finitely generated propositional language L closed under the usual 
propositional connectives and with a classical model-theoretic semantics. V is the set of 
valuations of L and M(a) is the set of models of a € L. Classical entailment is denoted 
by N. For i G N, we let I(i) = {0, . . . ,i} and = {1, ■ • ■ , i}- 



2 Epistemic States 

In epistemic states the preferences of sources are represented as plausibility rankings of 
natural numbers on the valuations of L; the lower the number assigned to a valuation, the 
more plausible it is deemed to be. This is along the lines of work initially proposed by 
Spohn [20]. It was used in [13,14] to define merging. Epistemic states are very similar 
to possibility distributions [7] and the K-framework of Williams [21] and it is relatively 
easy to translate between these frameworks. It is possible to use epistemic states in 
various ways. In the context of merging our aim is to employ epistemic states semi- 
qualitatively. The intention is for the ranks assigned to valuations to serve merely as 
markers in order to define a notion of relative distance between valuations, and nothing 
more. This eliminates the typical problem with quantitative approaches in which it is 
usually difficult to justify a particular assignment of numbers. The advantage of the semi- 
qualitative approach is that it allows us to express the strength with which preferences 
are held; something that cannot be achieved with orderings on valuations. For example, 
in an epistemic state it is possible to express the information that I prefer utov more 
than I prefer v to w. 
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Definition 1. An epistemic state <P is a (total) function from V to N. 

It is possible to extract a consistent classical knowledge base from an epistemic state <P 
by considering only those valuations with the best level of plausibility assigned to them. 
LetM*(<?) = {v & V \ <P(v) = i} andletmin(^) = min{^(n) \ v GV}. 

Definition 2. (j> G L is a knowledge base extracted from <P iffM{(f>) = (<P). 

Observe that the knowledge bases extracted from are all logically equivalent. We 
shall abuse notation by referring to BfP) as the knowledge base extracted from <P. 
The intention is that is some canonical representative of all the knowledge bases 
extracted from <P. By extracting knowledge from an epistemic state in this way we ensure 
that B(<P) will always be satisfiable. This is in line with the stated intention of employing 
epistemic states semi-qualitatively; the choice of having 0 as the best plausibility rank 
which can be assigned to a valuation is purely for the sake of convenience. 

Formally, we shall view merging as an operation in which the preferences of a 
sequence of sources, in the form of epistemic states, are combined to provide a new 
epistemic state representing the merged preferences of the sources. It is not sufficient to 
use finite sets of epistemic states, since different sources may have identical preferences, 
and the presence of more than one instance of an epistemic state may have a significant 
impact on the way in which merging takes places. 

Definition 3. An epistemic list E is a finite non-empty list, or sequence, of epistemic 
states. We let\E\ denote the size of E. 

In order for merging to be carried out at all it is crucial to make an assumption of 
commensurability, that all sources employ the same scale when they rank valuations. 
In practice this can be achieved by obtaining a worst level of plausibility P commonly 
agreed upon by all sources. Such a commitment does not mean that any of the sources 
has to rank at least one valuation at P; it simply means that this is the worst level of 
plausibility that a source would ever consider attributing to any valuation. We insist that 
P be a finite natural number. An agreement to use a particular worst level of plausibility 
means that all sources agree on a fixed level of granularity. 

Definition 4. An epistemic state <P is P-capped, where P G N, iff<P(v) < P for every 
V G V. An epistemic list E is P-capped iff every epistemic state in E is P-capped. The 
set of all P-capped epistemic lists is denoted by . The set of all epistemic states is 
denoted by S°°. 

This brings us to the formal definition of merging. 

Definition 5. A P-capped merging operation A is a function from to . 

Observe that P-capped merging does not necessarily yield P-capped epistemic states. 
In general it seems reasonable to expect that, at least in some cases, attempts to merge 
the information contained in an epistemic list may increase the granularity level of 
information contained in the resulting epistemic state. 

Definition 6. For n > \ a P-capped merging operation A is Q'bound for n iff for 
every P-capped epistemic list E s.t. \E\ = n, A(E) is Q-capped and for some P- 
capped epistemic list F s.t. |P| = n and some v G V, A(F)(v) = Q. A is Q-bound iff 
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it is Q-bound /or nfor every n > 1. A P-capped merging operation which is Q-bound 
for n is referred to as (P, Q, n)-capped. Similarly, a P-capped merging operation which 
is Q-bound is referred to as {P, Q) -capped. 

For n > 1, P-capped merging operations are (P, Q, n)-capped for some Q, but, as will 
be seen in section 3.1, need not be (P, Q) -capped for some Q. 



3 Basic Merging 

We are now in a position to provide some basic properties with which all P-capped 
merging operations ought to comply. Our claim is not that these properties define merg- 
ing. Indeed, in section 5 we shall consider more desirable properties for merging which 
cannot be derived from (Z\1)-(Z\6). 

For the remainder of the paper we follow the convention that an epistemic list E has 
the form . . . , ^|£;|] and that an epistemic list F has the form . . . , 

(Z\l) VP,P G s.t. \E\ = \F\, = Ei{v) Vi G X+{\E\), then A{E){v) = 

A{F){v) 

(A2) Vn > 1, if Z\ is Q-bound for n, then Vg G P(Q) there is an P G and a.v 
s.t. A{E){v) = q 

(Z\3) If there is a bijection tt : P+(P) P+(P) such that = IfV(i) Vi G I^{\E\), 

then A{E) = A{F) 

(Z\4) \fd>^{v) < <Pi{w) Vi G I+(|P|), then A{E){v) < A{E){w) 

(Z\5) If A{E){v) < A{E){w), then <Pi{v) < d>i{w) for some i G P+(|P|) 

(Z\6) lf^^{v) = <Pj{v)\HQ &T+{\E\),<P,{v) < Vi G P+(|P|), and < 

X>j{w) for some j G P'*"(|P|), then A{E){v) < A{E){w) 

These properties need some explanation and motivation. (Z\l) states that the rank that 
A assigns to a valuation v is independent of the ranks assigned to any of the other 
valuations. This is similar in spirit to the property in social choice theory known as 
the Independence of Irrelevant Alternatives [2] and is intended to capture a similar 
intuition. This issue will be discussed in more detail in section 4. The adoption of (4\1) 
enables us to define merging as an operation on sequences of natural numbers. Let 
seq^ = {s I s = si, . . . , s„ where n > 1 and Si G T(P) Vi G P+(n)}. For s G seq^ 
we denote the size of s by | s | . 

Proposition 1. Let A be a P-capped merging operation satisfying (Al). Then there 
is a S : seq^ — >■ N such that, Vv G V, VP G £^, Vs G seq^, if |s| = |P| and 
Si = X>i{v) Vi G I+(P), then i5(s) = A{E){v). 

Merging operations on sequences thus have an indirect connection with the merging 
of epistemic states and it is only with the adoption of a property such as (Al) that 
this connection can be made explicit. (A2) is a convexity assumption. It ensures that, 
for a merging operation bound by Q for n, no rank from 0 to Q remains unused for 
epistemic lists of size n. (Z\3) ensures that the order in which epistemic states occur in 
an epistemic list does not affect the outcome of merging. In [13,14] this property was 
referred to as commutativity and in social choice theory it is known as anonymity [9], 
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(Z\3) rules out any notion of prioritised merging in which some sources are seen to he 
more important, or trustworthy, than others. This does not mean that we deem prioritised 
merging to he undesirable, but rather that prioritised merging depends on the existence 
of rational merging operations in which all sources are equally reliable. Indeed, in [15] 
it is shown that there is a unique method of lifting non-prioritised merging operations 
into a prioritised setting. The adoption of (Z\3) means that it would be possible to define 
merging operations which receive inputs in the form of multisets or bags, instead of 
lists, of epistemic states. It is our position, however, that such assumptions should rather 
be made explicit, in the form of properties, instead of being encoded indirectly in the 
representational formalism. The intuitions associated with (Z\4), (/\5) and (Z\6) have 
been discussed in [13,14]. 

The following useful properties follow easily from the above properties. 
Proposition 2. Let A be a P-capped merging operation. 

1. If A satisfies (Zi5) or (Z\6) and A is Q-bound for n then Q > P 

2. If A satisfies either (A6) or (A4) and (A5) then A{E)(v) > min{^i(ti) | i G 

T+{\E\)}- 

3. If A satisfies (A6) and 3i,j,G I'^{\E\) s.t. d>i{v) ^jiy), then A{E){v) > 
min{^i(ti) I i G I+(|i?|)}. 

Part (1) of proposition 2 shows that the granularity level of information grows monoton- 
ically with merging. Parts (2) and (3) of proposition 2 provide lower bounds on the ranks 
assigned to valuations after merging. These results are all consistent with the intuition 
that more information leads to an increase in the level of granularity. 



3.1 Constructing Merging Operations 



In this section we briefly consider some methods for constructing merging operations 
on epistemic states. This is not an exhaustive survey of merging operations found in 
the literature. The intention is merely to show that there are constructions which satisfy 
(Z\1)-(Z\6). We consider the following merging operations: 



1. Z\max(£^)(w) = max{<Pi{v) I i G I+{\E\)} 



2 . = 
3. A^in2{E){v) = 



>Pi{v) = <pj{v) yi,j G I+{\E\), 

min{^i(ti) I i G X'^{\E\)'\ + 1 otherwise 
^ 2^i{E){v) = <P,{v) \/i,j GI+{\E\), 

2mm{<Pi{v) I i G X~^{\E\)} + 1 otherwise 

4. A2:{E){v) = Y.i^^+(\E\)^^{E){v) 



These operations have been proposed and discussed in [10, 5,4, 13, 14, 16], amongst others . 
Observe that 2\max and Zlmini are (P, P)-capped, Z\min 2 is (P, 2P)-capped, but that A^: 
is not (P, Q)-capped for any Q. We do know, however, that As is (P, nP, n)-capped 
for every n > 1. 



Proposition 3. 2\max, Ammi, Amin 2 cmd As all satisfy (A1)-(A6). 
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4 Social Choice and Merging 

Social choice theory [2,19] is a research area where the problems under scrutiny are 
similar to the problems encountered in merging. Social choice theory is concerned with 
the aggregation of preferences. An individual’s preferences is usually represented as a 
total preorder ^ over a (finite) set of alternatives fl. For x,y G f2, x < y means that 
X is at least as preferred as y. The interest then lies in the description of an aggregation 
operation over the preferences of n individuals, . . . , which produces a new 
preference ordering over 17. The similarities between this setup and our framework for 
the merging of epistemic sets should be obvious. It is a matter of taking 17 to be V, the 
set of permissible valuations, and using the total preorder on V induced by an epistemic 
state. Observe that such an induced total preorder contains less information than the 
epistemic state from which it was induced. 

One of the most important results in social choice theory is Arrow’s impossibility the- 
orem [2] which shows that there is no aggregation operation satisfying some intuitively 
desirable postulates. In this section we show that Arrow’s result can be circumvented 
when recast into the framework of epistemic states. 

Arrow’s first postulate, dubbed Restricted Range, requires the result of an aggregation 
operation to be a total preorder on 17. In our setup this translates to the requirement that 
a merging operation produce an epistemic state - something which is built into the 
definition of merging. The second Arrow postulate, known as Unrestricted Domain, 
states that one ought to be able to apply the aggregation operation to any n-tuple of 
total preorders on 17. In our setup this translates to the requirement that merging may 
be applied to any (P-capped) epistemic set - again, something that is built into the 
definition of merging. The third Arrow postulate, known as the weak Pareto Principle, 
can be phrased as follows for epistemic sets: 

(PP) If < <P,{w) Vi G I+{\E\), then A{E){v) < A{E){w) 

It is easily verified that (PP) is the contrapositive of (Z\5). 

The fourth Arrow postulate, known as the Independence of Irrelevant Alternatives, 
translates to the following postulate: 

(IIA) \JE,E G s.t. \E\ = |P|, <P,{v) < <P^{w) < E,{v) Wfj G I+{\E\) 

implies that Z\(P)(w) < A{E){w) iffZ\(P)(u) < Z\(F)(w) 

When deciding on the relative ordering of valuations u and v, (IIA) requires of us to 
disregard all other valuations. At least, that is the intuition. It is easily seen that the 
intuition does not hold in our more structured setup in which it is possible to define 
degrees of relative plausibility between valuations. 

Example I. Let E = [<Pi,(I> 2 \, E = such that d>i{v) = E 2 {w) = 0, ^\{w) = 

<I> 2 {w) = Eiiy) = 'p 2 {v) = 1, and d> 2 (v) = = 2. It is easily verified that 

<I>i{v) < <Pi{w) iffif'i(u) < Ei{w) and that < ^ 2 {w) iff 'f' 2 (i') < !? 2 ('w).Now 

consider the merging Z\max defined in section 3.1. It can be verified that A^gj^{E){w) = 
A-[aax{E){v) — 1, and Z\max(L')(i^) = A^a-x{E){w) = 2. So it is not the case that 
Ame,J,E)lv) < A^,,^(E)(w) iff A^^^(E)(v) < A^^^(E)(w). Zl^ax therefore does 
not satisfy (IIA). Observe, however, that the ranks of v and w are obtained without 
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reference to any of the other valuations. In fact, it is easy to see that the rank of any 
valuations obtained by applying Zlmax is independent of all other alternatives, even 
though Z\max does not satisfy (IIA). 

In the usual social choice theory framework, where only total preorders are used, it 
is necessary to define independence indirectly, in terms of the ordering between two 
valuations. In our more structured setup this independence can be described directly, 
in terms of the rank assigned to a valuation. Our contention, then, is that (Zil) is an 
appropriate reformulation of (IIA). 

The last of the Arrow postulates, known as Non-Dictatorship, states that one source 
should never be able to completely dominate. We can phrase this as follows. 

(ND) There is no i G T^{\E\) such that, for every E G , <Pi{v) < <Pi{w) implies 
A{E){v) < A{E){w) for every v,w €V 

It is easy to see that (ND) follows from (Z\3). 

Proposition 4. If a merging operation satisfies (A3) then it will also satisfy (ND). 

From the results above it follows that the modihed Arrow postulates all follow from (Z\l)- 
(Z\6). And since proposition 3 shows that there are merging operations which satisfy 
(Z\1)-(Z\6), we have shown that the Arrow impossibility result can be circumvented. It 
is the move from the total preorders on V to epistemic states, which have more structure 
than mere total preorders, which makes this circumvention possible. 



5 Strategy-Proof Merging 

Strategy-proofness is an idea that has received a great deal of attention in social choice 
theory, where it is frequently discussed in the context of elections. The aim is to define an 
election procedure in which a winner is chosen in such a way that the outcome is immune 
to manipulation by voters, or groups of voters. The hrst impossibility result related to 
strategy-proofness is due to Gibbard [8] and Satterthwaite [17,18]. Given some basic 
conditions on the number of available alternatives and the size of the electorate, and the 
(strong) requirement that an election procedure should produce a unique winner, their 
result shows that every election procedure which is non-dictatorial cannot be strategy- 
proof. This result is, perhaps, not particularly surprising. Consider, for example, the 
case in which two voters. Jack and Jill, have to choose between two candidates, A1 and 
George. If Jack strictly prefers A1 to George and Jill strictly prefers George to A1 then 
there simply is not enough information to declare either A1 or George the unambiguous 
winner. However, even if the requirement of producing a unique winner is relaxed it 
seems that Gibbard-Satterthwaite type results still hold [6,3]. 

Our aim in this section is to investigate notions of strategy-proofness in the context 
of merging. Requiring merging operations to be strategy-proof seems as necessary, and 
as desirable, as is the case for election procedures, or indeed, for aggregation operations 
in general. Before formalising the notion of strategy-proofness we first consider two 
properties that allude to it. For E,E G £^ s.t. \E\ = |F|, and fori C X^(\E\), we 
denote by rep(E,I,F) the epistemic list obtained by replacing <I>i with Ei for every 
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z G I. Intuitively rep{E,I, F) produces a modified version of E in which the sources 
mentioned in I have changed their preferences. For example, if E = ^ 2 , ^ 3 ] and 

F= [Ei,E2,E3],th&nrep{E,{2,3},F) = !f2, 

(Mont) if < 'f'i(u) then A{E){v) < A{rep{E, {z}, F)){v) 

(Mont) if 'Pilv) > Ei\v) then A{E){v) > A{rep{E, {i}, F))lv) 

(Mont) and (Mont) ensure that A exhibits monotonic behaviour. That is, (Mont) states 
that if a source worsens the rank it assigns to a valuation v, A will respond with a rank 
for V that is no better than the original. Similarly, (Mont) states that if a source improves 
the rank it assigns to a valuation v, A will respond with a rank for v that is no worse 
than the original. 

Proposition 5 . If A satisfies ( Z\4) then it also satisfies (Monf) and (Mo«t). 

These two properties do not guarantee strategy-proof behaviour, however, as will be 
shown in theorem 1 . For a merging operation to be regarded as strategy-proof it must be 
the case that there is no incentive for any source to misrepresent its preferences. To be 
more precise, whenever a source provides an accurate representation of its preferences 
there should be a guarantee that the result of merging will be no less compatible with its 
true preferences than if it had misrepresented its preferences. Of course, the formalisation 
of such properties presupposes the existence of an appropriate compatibility measure 
between epistemic states. In the special case of two P-capped epistemic states there is 
an obvious way to measure compatibility. 

Definition 7. For v G V, the compatibility measure between two P-capped epistemic 
states <I> and E with respect to v is 'F) = \<I>{v) — 'F{v)\. 

E) is a local compatibility measure in the sense that it measures compatibility 
between valuations contained in epistemic states. Below we provide two global measures 
in which compatibility is determined between epistemic states. 

Definition 8. 1. The compatibility measure between two P-capped epistemic states 

andEisfi{<I>,E) = Y..^yfi-{<I>,E). 

2. An epistemic state T is at least as compatible with as with E, denoted by T E, 
ijf'iv G V, T is more compatible with <1> than with E, 

denoted by T E, ijfT <E and E 2<i> T. 

It is easy to see that provides a stronger form of compatibility than E). 

Proposition 6. IfT E then fi{T, <P) < <P), but the converse does not hold. 

Definitions 7 and 8 make sense only when comparing P-capped epistemic states. Defin- 
ing compatibility between P-capped and Q-capped epistemic states, where Q f P, 
is more problematic. We shall briefly address this issue in section 6. For the rest of 
this section, however, we focus on (P, P)-capped merging operations, thereby ensuring 
that we only need to compare P-capped epistemic states. This enables us to formalise 
strategy-proofness as follows. 

* In the properties (Mont) and (MonJ,), the set! in rep{E,I, F) is the singleton set {z}, from 
which it might seem that the notation rep{E, X, F) is unnecessarily clumsy. However, in some 
of the properties later in the section we shall use this notation with X being any subset of 
X+i\E\). 
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(IP) A{E) A{rep{E,{i},F)) WE e s.t. \E\ = |F| 

(WIP) #(/)(£;), <?,) < #{A{rep{E, {z}, F)),<P,) VF G s.t. \E\ = |F| 

Both (IP) and (WIP) require of A{E) to be at least as compatible with the preferences 
of source i than A{F), where E is the epistemic state in which i’s preferences are 
represented accurately and F is obtained from F by z misrepresenting its preferences in 
some way. Given these properties a rational source will realise that is in its own interests 
to represent its preferences accurately. Observe that (IP) implies (WIP). 

Proposition 7. A merging operation satisfying (IP) will also satisfy (WIP). 

(IP) and (WIP) define strategy-proofness only relative to single sources and do not 
exclude the possibility of groups of sources misrepresenting their preferences in such 
a way that all members of the group benefit from it. Groups of sources that are able to 
achieve this will be referred to as strategy-coalitions. The formal definition of a strategy- 
coalition depends on the compatibility measure that is used. 

Definition 9. Let E G and consider any group of sources I C F+(|F|). 

1. I is a strategy-coalition in E iff3F G s.t. \E\ = |F|, A(rep(E,X,F)) 
A(E)\/i G X, and A(rep(E,X, F)) A(E) for some j G X. 

2. X is a weak strategy-coalition in E iff for some F G s.t. 

\E\ = |F|, #(A(rep(E,X,F)),^,) < #(A(E),^,) Vz G X, and 
#{A(rep{E,X,F)),^i) < #{A{E),^i) for some j GX. 

The following strategy-coalition proof properties are intended to exclude the possibility 
of forming strategy-coalitions. 

(SP) VF G there is no strategy-coalition in E 
(WSP) VF G E^ there is no weak strategy-coalition in F 

The following proposition shows that there are connections between strategy-proofness 
and the various ways of forming strategy-coalitions. 

Proposition 8. Strategy-coalitions are also weak strategy-coalitions. Therefore (WSP) 
implies (SP). Furthermore (WSP) implies (WIP), but (SP) does not imply (IP). 

The reason for (SP) not implying (IP) is that is not a total preorder, unlike the weaker 
measure based on F). 

In addition to misrepresenting preferences, it is conceivable that sources may stand 
to benefit by completely abstaining from providing information. For F G E^ and for 
someF C F+(|F|) we denote by rem(F,F) the epistemic list obtained by removing 
from F for every z G F+(|F|). Intuitively, rem(E,X) produces a modified version of F 
in which the sources mentioned in X abstain from providing information. For example, 
if F = [^ 1 ,^ 2 , then rem(E, {1, 3}) = [F 2 ]. We define a group of sources to be 
an abstention-coalition if, whenever all members of the group abstain, they all stand to 
benefit from doing so. 

Definition 10. Let E G E^ and consider any group of sources X C F^(|F|). 
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1. I is an abstention-coalition in E ijf A{rem{E, I)) Vz G T, and 

A{rem{E,X)) \Z.p^ A{E) for some j GX. 

2. X is a weak abstention-coalition in E iff ^{A{rem{E,X)),<Pi) < ff{A{E),<l>i) 
\/i G X, and ff{A{rem{E,X)),<Pi) < ff{A{E),(l>i) for some j GX. 

The next properties are intended to prevent sources from benefitting by abstaining. 

(AP) \/E G there is no abstention-coalition in E 
(WAP) ME G there is no weak abstention-coalition in E 

The notion of an abstention-coalition is stronger notion than that of a weak abstention- 
coalition. 

Proposition 9. Any abstention-coalition is also weak abstention-coalition and therefore 
(WAP) implies (AP). 

At present it is not clear whether an insistence on the absence of both strategy-coalitions 
and abstention-coalitions necessarily implies the absence of a combination of these no- 
tions. Thus, it seems necessary to provide properties forbidding the combination as well. 
Consider groups of sources X,J<G 1+ ( | Ti | ) and 1C C X~^ ( | | ) such that X = JUK. and 
JfdC = %.¥otEGS^^.t.\E\ = |F|weletrr(S,f7,F,/C) = rem{rep{E,J,E),IC). 
That is, rr{E, J , F, 1C) is the result obtained by first replacing <Pj in E with Ej for every 
j G J and then removing from the modified E for every k G 1C. For example, for 
E = andF’ = [Ei,E2,'p3,'p4\,rr{E,{2},F,{l,A}) = [if'2,^3]- 

Definition 11. Let E G and consider groups of sources X,J C I+(|iJ|) and 
1C C X^{\E\) such thatX = ff U K. and J C\1C = %. 

1. X forms a coalition in E iff 3F G E^ s.t. \E\ = jF’l, A{rr{E,ff,F,lC)) A{E) 

Vz G X, and A{rr{E^ J, F, /C)) A(^E) for some j G X. 

2. X forms a weak coalition in E iff for some F G E^ such that \E\ = |F|, 
ff{A{rr{E,J,F,lC))^i) < ff{A{E),^i) Vz G X, and 
ff{A{rr{E,J,F,lC)),^i) < ff{A{E),^,) for some j G X. 

So, a coalition, and a weak coalition, is a group of sources for which it is possible to 
either misrepresent their preferences or abstain from providing information in such a 
way that all members of the group stand to benefit. The next two properties forbid the 
existence of coalitions and weak coalitions respectively. 

(CP) \/E G E^ there is no coalition in E 
(WCP) Vi? G E^ there is no weak coalition in E 

There are close links between being strategy-proof and the various notions related to 
coalitions. 

Proposition 10. Any coalition is also a weak coalition and therefore (WCP) implies 
(CP). Also, (CP) implies (SP) and (AP) and (WCP) implies (WSP) and (WAP). 

Recall from section 3.1 that Z\max and Z\mini are (P,P) -capped merging operations. 
It turns out that one of these satisfies all the properties related to strategy-proofness 
discussed here, and the other one doesn’t. 
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Proposition 11. Amax satisfies (WCP) and (IP). Amini satisfies (AP) and (WAP), but it 
satisfies neither (WIP) nor ( CP). 

The following result summarises the main contributions of this paper. 

Theorem 1. Amax satisfies (A\)-(A&), (WIP), (IP), (WSP), (SP), (WAP), (AP), (WCP) 
and ( CP). Amini satisfies (A\)-( Z\6j, (AP) and (WAP), but does not satisfy (WIP), (IP), 
(WSP), (SP), (WCP) or (CP). 

Theorem 1 can be seen as a circumvention of Gibbard-Satterthwaite style impossibility 
results. It shows that, in the context of epistemic states, it is possible to define rational 
merging operations which are immune to manipulation by single sources and, indeed, by 
groups of sources. In addition, the fact that /\mini does not satisfy the properties related 
to strategies and coalitions shows that these properties cannot be derived from the basic 
properties for merging and that their addition constitutes a strict extension of (Z\1)-(Z\6). 
Our conjecture is that the same holds for the abstention-related properties, although we 
have not formally shown this to be the case. 

6 Conclusion 

In this paper we have drawn connections between information merging and social 
choice theory and shown that Arrow’s impossibility result and versions of the Gibbard- 
Satterthwaite theorem disappear when recast in terms of epistemic states. The results 
described here need to be elaborated upon, though. In section 5 the focus was on the 
special case of {P, P) -capped merging operations since measures of compatibility be- 
tween P-capped epistemic states are then easily obtained. In the general case, however, 
where we are dealing with a (P, Q)-capped merging operation, we are faced with the 
problem of comparing epistemic states with different levels of granularity. For example, 
Atnini defined in section 3.1 is a (P, 2P) -capped merging operation, making it necessary 
to define an appropriate way of comparing P-capped epistemic states with 2P-capped 
ones. Currently it is unclear how to do so. One possible way to deal with this issue is to 
provide an appropriate method for mapping epistemic states with a high granularity level 
into epistemic states with the appropriate lower level of granularity. Such a mapping can 
be seen as a way to “convert” a Q-capped epistemic state to a P-capped one, thus mak- 
ing it possible to compare the two epistemic states. For example, for Z\„jin 2 we need a 
suitable method for mapping the elements of P(2P) to P(P) . In this case the appropriate 
mapping seems to be the function p : I{2P) — P(P) such that p{i) = [z/2] (where 
[z/2] denotes the smallest integer which is no smaller than i/2). However, it is not clear 
how to determine which mappings are appropriate in the general case. At present the best 
we can do is to insist that a mapping p which converts a Q-capped epistemic state to a 
P-capped one should be a surjective function from I{Q) to P(P) such that p{i) < pU) 
whenever i < j. 
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Abstract. This paper addresses the problem of merging data provided 
by several sources of information. It aims at comparing two techniques 
of merging, one provided by Theory of Evidence and the other provided 
in the held of logic for merging knowledge-bases. 

Eor doing so, it hrst presents a logical interpretation of Theory of Evi- 
dence, which is proved to be valid when the numbers are rational. Then, 
it shows the equivalence between the Maximum of Plausibility Decision 
Strategy of Theory of Evidence and a particular knowldege-bases merg- 
ing operator, known to be a majority and also an arbitration operator. 



Keywords: Data merging. Theory of Evidence, logic. 

1 Introduction 

Merging data is a problem which has received a lot of attention during these last 
years [2], [3], [4], [19], [13], [14], [16], [11], [10], [12]. This is due to the growing 
number of applications in which one needs to access several information sources 
to make a decision. 

The many works which address this problem shows that there is not an 
unique method for merging information. Obvioulsy, the adequate merging pro- 
cess depends on the type of the information to be merged, which can be beliefs 
the sources have about the real world or preferences, i.e, descriptions of more or 
less ideal worlds. But the merging process also depends on the meta-information 
one has about the sources. For instance, in the case of merging beliefs provided 
by several sources, if the respective reliability of the sources is known, it ob- 
viously must be used in the merging process: the more reliable a source is, the 
more we trust it. But, if this meta-information is not known, some other types of 
merging processes must be defined. Konieczny and Pino-Perez’s work, [11], [10] 
addresses this last case by defining two kinds of merging operators respectively 
called majority merging operators and arbitration merging operators. The first 
ones aim at implementing a kind of majority vote between the sources, and the 
second ones aim at reaching a consensus between the sources by trying to satisfy 
as much as possible all of them. 
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Furthermore, in the case of beliefs merging, the sources may be more or less 
certain about the data they deliver. In that case, formalisms for reasoning about 
uncertainty must be used. 

One of these formalisms is Theory of Evidence, which is a mathematical 
theory dehned by Dempster [6] and Shafer [17], allowing one to reason with 
uncertainty and offering a rule for combining uncertain data provided by sev- 
eral information sources. In this theory, the uncertainty is represented by the 
fact that any proposition of the frame of discernment is associated with a real 
number, called its mass, which belongs to [0,1] and such that the mass of the 
contradiction is 0^, and the sum of all the masses is 1. These numbers intend to 
represent a measure of belief committed exactly to the proposition. Given all the 
masses, one can dehne two other numbers: the degree of belief of a proposition, 
which is also a real number belonging to [0,1] and which represents the degree of 
support a body of evidence provides for this proposition, and the degree of plau- 
sibility of a proposition, which is also a real number belonging to [0,1] and which 
represents the extend to which one fails to doubt the proposition. Furthermore, 
this theory also focuses on the combination of masses through Dempster’s rule of 
combination. This explains why Theory of Evidence is commonly applied in data 
fusion problems where data are uncertain. For instance, in object identihcation 
problems, (i.e, situation assessment [1], candidate assessment [7], decision from 
MRI images [8], data association [15]) the point is that several sources provide 
their own beliefs about an observed situation, and the problem is to decide which 
is the actual situation. For doing so. Theory of Evidence provides different de- 
cision strategies. For instance, the possible worlds may be ordered according to 
their degree of belief or to their degree of plausibility, and the associated decision 
strategy consists in selecting the maximal world according to this order. This 
comes to consider that the actual world is the one which has the highest degree 
of belief or the highest degree of plausibility. 

This paper aims at establishing some relations between methods for combin- 
ing data in Theory of Evidence and some methods provided in the held of logic, 
for merging knowledge-bases. 

For doing so, it hrst presents Theory of Evidence in the light of classical 
propositional logic and establishes a relation between these two approaches to 
information representation. Then, it establishes some formal relations between 
the strategy of Maximum of plausibility and a merging operator dehned by 
Konieczny and Pino-Perez. 

This paper is organized as follows. 

Section 2 presents an interpretation, in a logical setting, of the main concepts 
of Theory of Evidence. Section 3 and section 4 prove the formal equivalence 
between the Maximum of Plausibility Decision Strategy of Theory of Evidence 
and a particular knowledge-bases merging operator known to be a majority and 
an arbitration one. Finally, the last section lists some open questions. 



^ In some extensions of this theory, this constraint is relaxed. But we will focus on the 
initial version of Theory of Evidence 
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2 A logical interpretation of Theory of Evidence when 
numbers are rationals 

In this section, we present an interpretation of Theory of Evidence^ in logical 
terms. But before, we need to recall some dehnitions. 

2.1 Preliminaries 

Definition 1. A multi-set is a set where repeated occurrences of an element 
may exist. A knowledge-set is a multi-set of propositional formulas^. 

Example. If A and B are propositional letters, then KS = [B, A\/ B , A, A\/ 
B, Aw B\ is a knowledge-set. 

Definition 2. Let KS\ = [Ei,...,E'„J and KS2 = , ..., Ep] be two 

knowledge-sets. Their union is: KS\ U KS2 = [Fi, F„-^, _l_i, ..., Fp\. 

Definition 3. Let KSi = [Fi, ..., F„] and KS2 = [Gi, ..., G„] two knowledge- 
sets of the same size. KSi and KS2 are equivalent, noted KSi f-t KS2 iff there 
exists a bijection / from KSi to KS2 such that Vi G {l...n} |= F( f{Fi) 

2.2 Main concepts of Theory of Evidence 

Theory of Evidence assumes the existence of a frame of discernment which is 
defined as a set 0 of A hypothesis : 0 = {Hi, ..., if at}- Hypothesis correspond to 
propositions one is dealing with. Their intuitive meaning depend on the context 
of application. Eor instance, in an identification problem, the hypothesis Hi will 
represent the fact “the object to be identified is Hi”. The meaning given to 
hypothesis is out the Theory of Evidence. These hypothesis are supposed to be 
exclusive. This means for instance, that the object to be identified cannot be 
both Hi and Hj if i j. Eurthermore, in the initial version of Theory of Evidence 
(and we will focus on it), the hypothesis are supposed to be exhaustive. This 
means that the object to be identified is Hi or ... Hn- This assumption is called 
Closed- World Assumption. Einally, in the Theory of Evidence, propositions are 
represented by subsets of 0 . The set 2 ® is called Referential of definition. 

Let 0 = {Hi, Hff} be a frame of discernment. In the logical formula- 
tion, we will say that 0 is a propositional language whose propositional letters 
are Hi...Hf^. Under the Closed-World Assumption and under the exhaustivity 
hypothesis, we will consider the axioms: 

(CW) HiW ...W Hn and {EX CL) ~^{Hi A Hj) if i 7^ j 

This implies that the possible worlds are the N worlds wi, ...wn where Wi 
is the world in which only Hi is true. Thus, the problem of identification is the 
problem of determining which, among these possible worlds, is the actual world. 

^ Due to limitation space, we do not address conditioning nor discounting. 

^ One must notice that Konieczny and Pino-Perez define a knowledge-set as a multi- 
set of sets of formulas but, considering formulas is enough by assimilating a set of 
formulas as the conjunction of its formulas 
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We denote EV the theory whose proper axioms are (CW) and (EXCL). As 
usual, the relation of logical consequences will be denoted by 

One can notice that in theory EV , any proposition is equivalent to a positive 
clause'^. So here, the Referential of Dehnition is the set of positive clauses of 0. 
This means that any subset of the frame of discernement is logically represented 
by a positive propositional clause. For instance, the subset {Hi, H 2 , Hs} of 0 is 
logically represented by the positive clause Hi V H 2 V H^. 

The basic notions of the Theory of Evidence are the notion of basic prob- 
ability assignment (or mass function), the notion of belief function, associated 
with an information source which, by this way, expresses its uncertainty about 
beliefs and the notion of plausibility function. They are mathematically dehned 
by: 

A basic probability assignment is a function m : 2® — ;> [0, 1] such that: 
(i) m[^) = 0 and [it) m[A) = 1 

AC0 

A belief fnnction is a function Helm '■ 2® — ;> [0, 1] which associates any A 
of to Belm{A) with: 

Belm{A) = ^ m{B) 

BCA 

A plansibility fnnction is a function Plm '■ 2® — ;> [0,1] which associates 
any A to Plm{A) with to Plm{A) = 1 — Belm{A), where A is the complement 
of A in 0. 

Shafer gave the following informal interpretation of these numbers: the basic 
probability number m{A) is understood to be the measure of belief committed 
exactly to A (.... but) not the total belief that ones commits to A. To obtain the 
measure of the total belief committed to A, one must add to m{A) the quantities 
m{B) for all proper subsets B of A. Pinally, the degree of plausibility represents 
the extend to which one fails to doubt the proposition. So, the problem for us 
is to give, in logical terms, a meaning to these comments. This is done in the 
remainding of this section, for the case when the numbers are rational. 

The logical representation of a basic assignment is given by the following 
dehnition: 

Definition 4. Let m be a basic assignment defined on 0 by: 
m{Pi) = ni/N, ...., m{Pk) = nu/N with rii + ... + nu = N . 

The logical representation of m is the knowledge-set denoted ksfm) defined by: 
ks{m) = [Ki, ...,Kn) with: Afi = ... = = {f’l}, K^^+i = ... = A„i+n 2 = 

{P2}, ••• KM-rik+i = ••• = Km = {Pk} 

Notation. One must notice that, in this definition, the same symbol Pi is 
used to denote a subset of the Frame of discernment 0 and the propositional 
clause, of the propositional language associated with 0, which logically repre- 
sents it. Flowever, it must be clear that m is defined on subsets while ksfm) is a 
multi-set of propositional positive clauses. 

^ A positive clause is a disjunction of positive literals. 
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Example. Let m{A) = 2/3, m{A, B) = 1/3. The logical representation of m 
is the knowledge-set: ks{m) = [A,A,A\/ B], 

Proposition 1. Let m be a basic assignment. The mass of a proposition P, 
m[P), is the proportion of formulas Ki in ks{m) which are equivalent, under 
EV , to P. I.e, the proportion of Ki in ks{m) such that: EV ^ Ki f-t P. 

Proposition 2. Let m be a basic assignment. The degree of belief of a 
proposition P, Belm{P), is the proportion of Ki in ks{m) which, under EV , 
imply P. I.e, the proportion of Ki in ks{m) such that: EV ^ Ki P. 

Example (continned). The three formulas of ks{m) imply A V B, thus 
Belm{A V 5) = 1. But only two formulas imply A so, Belm{A) = 2/3. 

Proposition 3. Let m be a basic assignment. The degree of plausibility of 
a proposition P, Plm{P), is the proportion of Ki in ks{m) which do not, under 
EV , imply -iP. I.e, the proportion of Ki in ks{m) such that: EV A Ki A P is 
consistent. 

Example (continned). All the formulas in ks{m) are consistent (under 
EV) with A, so Plm{A) = 1. Only one formula is consistent (under EV) with 
B, so Plm{B) = 1/3. 

To sum up this section, we can say that any basic assignment as defined by 
Shafer can, if the numbers are rational, be modelled by a knowledge-set as defined 
previously. Thus, the mass of a proposition A is the proportion of formulas in 
this knowledge-set which are eguivalent, under EV , to A. The degree of belief 
of A IS the proportion of formulas which imply, under EV , A. The plausibility 
degree of A is the proportion of formulas which are consistent, under EV , with 
A. 

2.3 Dempster’s rnle of combination 

Dempster’s rule of combination has been dehned for combining several basic 
assignments. In the following, we focus on the case of two assignments only (the 
extension to the general case is obvious since the combination is commutative 
and associative) and we assume that the two assigments are dehned on the same 
frame of discernment. Given two basic probability assignment mi and m 2 dehned 
over a frame 0, Dempster ’rule of combination dehnes a third basic assignment 
denoted mi 0 m 2 by the following equation: 

, Y)A,nB,=A mi{Ai).m2{Bj) 

m{A) = 

with 

N=Y^ ^ mi(Ai).m2(Bj) 

Ai^ih A,OBj=A 

Obviously, the fraction has a meaning only if # 0. This assumption cor- 

responds to the case when the two basic probability assignments are not totally 
in conhict. In the following, we give the correspondance, in terms of knowledge- 
sets, of this rule. 
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Definition 5 Let KSi = and KS 2 = {A' 2 , be two 

knowledge sets of the same size. We say that they are in total conflict iff 
\/K[ e KSi VA '2 e KS '2 EV U {K\ a A' 2 ) is inconsistent. 

Proposition 4. ks{mi) and ks{m 2 ) are in total conflict iff A = 0. 

Let us now define an operator, also denoted 0, which combines two knowledge- 
sets which are not in total conflict. 

Definition 6 Let KSi = and KS 2 = {A '2 , ..., A'|^} be two 

knowledge-sets of the same size which are not in total conflict. The operator 0 
on knowledge-sets define a third knowledge-set by: 

KSi 0 KS 2 = [A' : 3K[ G KSi and 3K^ G KS 2 such that : 

EV U {K{ A K2} consistent and, 

EV ^ (Aj A I<i) A] 

Proposition 5. Let nii and m 2 be two basic assignments which are not in 
total conflict and let ks{nii) and ks{ni 2 ) be their logical representation. Then, 
ks{nii 0 m 2 ) f-7> ks{nii) 0 ks{ni 2 ). 

This proves that the operator 0 on knowledge-sets corresponds to the logical 
interpretation of Dempster’s rule of combination. 

Example. Let mi and m 2 be two basic assignments defined by: nii{A, B) = 
l/2,mi(B) = 1/2 and ni 2 {A) = l/3,m2(ff,B) = l/3,m2(B) = 1/3. Demp- 
ster’s rule defines the basic assignment mi 0 m 2 by: mi 0 m 2 (A) = 1/5, mi 0 
ni 2 {A, B) = 1/5, mi 0 ni 2 {B) = 3/5. Besides, the logical interpretation of mi 
and ni 2 are: ks{nii) = [A V B,B] and ks{ni 2 ) = [A,A\/ B,B], We can then 
compute ks{nii) 0 ks{ni 2 ) and get: ks{nii) 0 ks{ni 2 ) = [A, A\/ B, B, B, B], We 
can easily check that: ks{nii 0 m 2 ) f-t ks{nii) 0 ks{ni 2 ) 

To sum up this section, we can say that the basic assignment mi 0 m 2 , 
provided by Dempster’s rule on two assignments mi and m 2 , can be logically 
interpreted by the knowledge-set denoted ks{mi) 0 ks[m 2 ) previously defined, 
ks{mi) and ks{m 2 ) beeing the knowledge-sets which logically interpret mi and 
m 2 . 

2.4 Maximum of plausibility Decision Strategy 

An assignment, obtained by combination or not, thus implicitly defines several 
orders between hypothesis of the frame of discernment. Indeed, one can order the 
hypothesis according to their degrees of belief, or to their degree of plausibility 
or to their pignistic probability [18]. 

Let us focus here on the Maximum of plausibility Decision Strategy (MPDS) 
which consists in selecting the hypothesis which are the most plausible (i.e, which 
have the greatest degree of plausibility) . 

We can show that, under EV , any hypothesis corresponds to only one world 
in ||Aff||®, the one in which this hypothesis (and only this one) is true. So we can 
define the degree of plausibility of a model of EV by the degree of plausibility of 



® ||A|| denotes the set of interpretations which satisfy E i.e, the set of models of E 
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the only hypothesis which satishes it. And thus, the MPDS is equivalent to select 
the models of EV , which are the most plausible. Let us denote M axpl[\\EV\\, m) 
the set of worlds in | | which are selected by the MPDS dehned by assignment 

m, i.e, the most plausible (for m) models of EV . 

3 Maximum of plausibility vs merging operators 

In this section we show that the MPDS is equivalent to a particular merging 
operator dehned by Konieczny and Pino-Perez. 

3.1 The merging operator Sum 

Let us recall here some results given in [11] and [9]. Notice that we focus only 
on some results and change the notations. 

Definition 7. Let w and w' be two interpretations of a propositional lan- 
guage. The drastic distance d{w, w') between w and w' is defined by: d[w, w') = 0 
if[ w = w' and d{w, w') = 1 if[ w ^ w' . Let w be an interpretation and K a for- 
mula. The distance between w and K is defined by: d{w, K) = Min„,/g||x||d(w, w') 
Let w be an interpretation and M a multi-set of formulas. One can define a dis- 
tance, denoted here dgum between w and M by: dsum{w, M) = d{w, Ki) 

Definition 8. Let M be a multi-set of formulas. One can define a pre-order 
on the propositional interpretations by: 

w<dg^^{M)w' iff dgy_m{w,M) < dgy_m{w' ,M) 

Definition 9. Let M be a multi-set of formulas and 70 a set of formulas 
considered as integrity constraints. A merging operator, denoted here Sum, is 
defined by: 

Sum{M, IC) = Hue'll 

In other terms, the result of merging, under constraints IC, of the different 
formulas in M by the operator Sum, is semantically characterized by the models 
of IC whose distance to M , defined by dgum, minimal. 

Konieczny has shown that the operator Sum is both a majority merging oper- 
ator and an arbitration one, and thus, satisfies the postulates which characterize 
such operators. 

3.2 Relation between MPDS and Sum 

The relation between the Maximum Plausibility Decision Strategy and the merg- 
ing operator Sum is given by the following result: 

Proposition 6. M axpl{\\EV\\, m) = Sum{ks{m), EV) 

Sketch of proof . We first prove the following lemma : 

Let K be a propositional formula which is not a contradiction. Let Wi E ||7'K|| 
et Hi the unigue propositional letter such that ||77jj| = {wi}. Then: 
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(a) d{wi, K) = 0 iff EV U {K A Eli) is consistent 

(b) d{wi, K) = 1 iff EV U {K A iTj} is inconsistent 

Then we show that, if wi and W2 are two models of EV .Then 

dsum(,^i: ksi^m'j'j V ksi^m'j'j iff Fl^rn^wfj V El^ji (W2) 

This proposition shows that, given a basic assignment m, the Maximum 
of Plausibility Decision Strategy comes to merge, by the operator Sum, the 
knowledge-set which logically represents this assignment. I.e, the Maximum 
Plausibility Decision Strategy is a merging operator which is both a majority 
and an arbitration one and thus sastihes the postulates of this kind of operators. 
Due to space limitation, we cannot give the reformulation of these postulates in 
our setting, but this can be found in [ 5 ] 

4 Merging operators vs Maximum of plausibility 

In this section, we prove some symetrical results. We show that any knowledge- 
set KS can be represented, in Theory of Evidence, by a basic assignment mjis- 
Then, we show that the merging operator Sum applied on KS under constraints 
IC, characterizes a formula which corresponds to what provides the MPDS when 
dehned by m^s and when considering IC . 

4.1 Representation of a knowledge-set by a basic assignment 

Let T be a hnite propositional language and let wi, ..., wjv its interpretations. 

We can dehne a frame of discernment 0 ^ with n hypothesis denotes Hi, 
which are exclusive and exhaustive. These hypothesis are dehned by isomorphism 
from the interpretations wi, ..., wjv, but for simplifying the presentation, we will 
omit this isomorphism: by convention, the world Wi is associated with the hy- 
pothesis Hi. 

Definition 10. Let K be a consistent set of formulas of L and let ||/T|| = 
{wjy , ..., Wjy} the set of its models. We note t{K), the set of hypothesis defined 
by t{K) = {Hi^,...,HiJ. 

Example. Let L be a propositional language whose letters are a and h. The 
possible worlds are wi = {a, h}, W2 = {a, ->1}, ws = {->0, h} and W4 = {->0, 

0 i is thus the frame of discernment whose hypothesis are Hi, H2, H3, H4. Let 
us now consider the formula a V ->h. Then, t{a V ->h) = {Hi, H2, H4}. 

Definition 11. Let KS be a knowledge-set of L containing only consistent 
formulas. We denote mjis the basic assignment, which associates any set of 
hypothesis A of 0 ^ with the proportion of formulas K in KS such that t{K) = A. 
One can check that definition 11 indeed characterizes a basic assignment. 
Example. Let S be the knowledge-set made of the three formulas : Ki = 
a A &, A'2 = a A c, Ks = -la. The possible worlds are: wi = {a,h,c},W2 = 
{a,h,-ic}, ...,wg = {-ifl, -i&, -ic}. Let Hi...Hg the hypothesis of 0 ^. We have: 
||Ai|| = {wi,W2}. Thus, t{Ki) = {Hi,H 2}. \\K2W = {wi,W3}, then t{K2) = 
[Hi, H3}. Finally, HA3II = [w^, wq, wr, ws} and thus ^(As) = {H5, Hq, Hr, Hs}. 
This defines the following basic assignment: : mKs{Hi, H2) = mKs{Hi, H3) = 
mKs{H 5 ,He,Hr,Hs) = 1 / 3 . 
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4.2 Relation between Sum and MPDS 

Proposition 7. t(Sum(K S, IC)) = Maxpl[t[IC),mKs) 

Sketch of proof . 

We first prove the following lemma: let w be an interpretation of L and let H 
be its associated hypothesis in 0^. Then, PlmKsi^) ** equal to the proportion, 
in KS, of formulas whose w is a model. 

Then we prove that, if wi and W 2 are two interpretations of L and if Hi and 
H 2 are the hypothesis of 0^ which are respectivly associated with them, then : 
dsum{wi,M) < dsnm(u>2, M) iff PlmKs{Hl) > PlrriKsiH'j) 

In other words, merging the knowledge-set KS under constraints IC with 
operator Sum is equivalent to selecting the most plausible hypothesis of t{IC) 
according to the assignment mjcs ■ 

Example. Let us take again the knowledge-set KS dehned by the three 
formulas: K\ = a A &, A '2 = a A c, Ks = -la. Assume that IC = 0. We have: 
Sum[E , IC) = {a,b,c}. And t{Sum{E , IC)) = {Hi}. The assignment associ- 
ated with KS, mjis, is dehned by: 

mKs{Hi,H2) = mKs{Hi,H3) = mKs{Hr„ ■■■, Hg) = 1 / 3 . 

Since IC = 0, ||/C|| = {wi,...,ws} and thus t[IC) = {Hi, ..., Hg] . Among 
these hypothesis, the most plausible according to mKS is Hi. Thus, 
t[Sum[KS, IC)) = Maxpl[t[IC), mxs)- 

5 Conclusion 

The contributions of this work are the following. First, it shows a possible way 
for interpreting in logical terms, the basic notions of Theory of Evidence when 
numbers are rational. In this case, it shows the equivalence between assignments 
and multi-sets of propositional formulas. Then, it shows the equivalence between 
the Maximum of Plausibility Decision Strategy provided by Theory of Evidence 
and the merging operator, here denoted Sum, provided for merging knowledge- 
bases. 

The equivalence between the two theories being established, one consequence 
is that it is now possible to export some notions from one theory to the other. 

In particular, the logical interpretation of Demspter’s rule of combination, 
dehnes an operator on multi-sets of formulas, denoted 0 in section 2 . 3 , which 
aims at combining two multi-sets of formulas. Such an operator has, as far as 
we know, never been studied and it would be interesting to study its properties 
and compare it to the other knowledge-base merging operators 

Conversely, the multi-sets union operator suggests a new rule for combining 
two assignments. It can be shown that this rule is commutative and associative, 
and is always applicable, even if the two assignments are in total conflict which 
shows the interest of such of rule of combination. Studying the properties of this 
rule of combination is the subject of current research. 

Finally, comparing other decision strategies (Maximum of Credibility and 
Maximum of Pignistic probability for instance) and other knowledge-bases merg- 
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ing operators remains to be done. And this constitutes another direction of re- 
search. 
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Abstract. The problem of revision is to find which formula \|/ can he deduced 
from a formula (p, which has been added to a Knowledge Base KB. Since (p can 
bring inconsistency to KB, non-monotonic inference relations which are ahle to 
deal with inconsistency have been proposed; note that classical revision takes 
place after the arrival of (p. The aim of this paper is to propose a priori revision, 
that is to provide a way to "armor" the KB hy suppressing some knowledge and 
hy forbidding to accept some new information in such a way that adding any 
allowed formula (p to the revised KB will not bring inconsistency. 



1 Introduction 

A lot of researchers have studied inconsistency handling in knowledge bases (KB for 
short). The KB is used to describe a system and to deduce new information about it. 
The difficulty is to reason with an inconsistent KB because the possible deductions 
become trivial; if we do not want to throw away the whole KB we have to handle 
inconsistency. A particular problem is the insertion of a new formula in an initially 
consistent KB; reasoning with the KB after the arrival of this new formula is called 
revision [Alchourr6n&al85, Winslett88, Katsuno-Mendelzon91]. So, the problem of 
revision is to find which formula \|/ can be deduced from a formula tp that has been 
added to the KB. The inference must not be the classical inference since tp can bring 
inconsistency to the KB. This is why, many researchers have proposed, so called, non 
monotonic inference relations which are able to deal with inconsistency. Those non 
monotonic inference relations use some preference relations that select the most 
interesting consistent sub-theory(ies) of tp u KB in which classical deduction can be 
applied. Note that classical revision takes place after the arrival of a new information 
tp, so this revision can be called a posteriori revision. 

The aim of this paper is to propose a way to make a priori revision. In a priori 
revision, we want to provide a way to "armor" the KB by suppressing some rules and 
by forbidding to accept some new information in such a way that adding any allowed 
formula tp to the revised KB will not bring inconsistency. Consequently, in the revised 
KB, classical monotonic inference relation will always be usable. In this work, we 
distinguish between input variables, which can compose a new information, and other 
variables; we restrict also a new information to be a conjunction of input literals. We 
propose to examine the initial KB to provide a set of armored KB such that each one 
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will be consistent with any conjunction tp of allowed input literals. A diagnosis is 
composed by a set of formulas that must be removed from the KB and a set of 
integrity constraints which define valid new information for the KB; those integrity 
constraints provide a way to eliminate some formulas from the set of possible arriving 
new formulas. Applying a diagnosis to a KB is called armoring the KB. One 
difficulty is that it can exist many such diagnoses. So, we propose to use a penalty 
preference relation [Dupin&all94] in order to select preferred diagnoses and so 
armored KB. 

This paper is organized as follows. In a first part, we define a priori revision. In the 
second part, we present a preference relation on diagnoses, based on penalty theory, 
in order to provide a way to choose a best diagnosis, to obtain a best armored KB. In 
the last part we propose algorithms to compute the diagnoses and their associated 
penalty cost. 



2 How to Armor a Knowledge Base? 

In the following, we denote by L a finite propositional language. Elements of L, or 
formulas, are denoted by Greek letters. An interpretation in L is an assignment of a 
truth value in (T, F} to each formula of L in accordance with the classical rules of 
propositional calculus. A literal is an atomic variable p or its negation -ip. An 
interpretation co is a model of a formula a (co"a) iff CO(oc)=T. A formula [3 is called a 
logical consequence of a (oc"(3) iff each model of a is a model of (3. A formula a is 
said to be consistent iff it has at least one model. Any inconsistent formula can be 
denoted by _L. A knowledge base KB is a set of logical formulas. Non monotonic 
inference relation will be denoted by . 

The problem of revision is to decide if, given a knowledge base KB composed by 
logical formulas and a new information tp, we can deduce \|/, denoted by tp |^kb V- 
The a posteriori revision selects a set of consistent subsets KB; of KB such for 
each subset KB;, KBi u tp |- \|/, which is noted tp |^kb V- The point is to define a 
preference relation which is able to select the most interesting preferred consistent 
subsets. In order to discriminate between the consistent subsets of KB, some 
approaches [Rescher64, Brewka89, Nebel91, Dubois&all92, Benferhat&al93, 
Lehmann92, Cayrol-Lagasquie95] consist in ranking the KB into priority levels and 
maximizing the set or the number of formulas satisfied at each level starting from the 
highest priority level. An important aspect of this kind of approach is that violating 
however many formulas at a given level is always more acceptable than violating only 
one formula at a strictly higher level; thus, these approaches are non-compensatory, 
i.e., levels never interact. An alternative approach, called penalty approach, 
[Pinkas91, Dupin&all94] is to weight the formulas of the KB with positive numbers 
called penalties. Intuitively, the penalty associated to a formula represents the 
importance of the formula, the higher it is, the more important is the formula and the 
more difficult it will be to reject this formula. Inviolable formulas are given an infinite 
penalty. Contrarily to priorities, penalties are compensatory since they are additive: 
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the cost associated to a subset of formulas of a KB is the sum of the penalties of the 
rejected formulas. The subsets having a minimum cost are preferred subsets of KB. 
Notice that in all these approaches, tp has a maximal priority. 

We now present the framework we use in order to make a priori revision. We define a 
set of input variables which is a subset of the variables of L. An input literal is an 
input variable or its negation; we note I this set of input literals. The aim of a priori 
revision is to compute a revised KB, denoted by D(KB), so that for any valid 
conjunction of input literals tp which will be added in the future, D(KB) u tp will be 
consistent; hence the classical monotonic inference relation will always be usable 
with tp. Such a revision of KB is made by defining a diagnosis. A diagnosis is 
composed of a set of formulas that must be removed from the KB and of a set of 
integrity constraints which define valid new information for the KB. An integrity 
constraint is a formula (IjD--- Dl„ ^ -L), where each I is an input literal. Such a 
constraint means that liD--- Dl„ cannot be added to the KB. To simplify the notations, 
the constraint formula (1,D--- Dln~^ -L) will be represented by its set of literals { Ij, ..., 
!„}• 

We consider the following restrictions: the possible new information is a conjunction 
of input literals and the knowledge base is a set of Horn clause formulas where the 
positive literal in the clause is not an input literal (we can represent these Horn clauses 
by implicative formulas where input facts can only occur in the premises). This 
framework allows us to consider Modus Ponens as the unique inference relation 
(more formally, with our restrictions, if tp is a conjuction of input literals then tp u KB 
infers classically \|/ is equivalent to tp u KB infers \|/ by Modus Ponens). 

Definition 1 - Diagnosis of a Knowledge Base 

Let KB be a knowledge base; I a set of input literals. Let D be a pair < E^, rj,>, where 
Tp is a subset of KB and is a set of literal sets, {{l[j; ..., li„}; ...;{lpi; ...; lp„,}} that 
represents a set of p integrity constraints, called R^p, 

D is a diagnosis for KB if for every conjunction j of input literals, consistent with the 
integrity constraints { j >u KB \ r^, is consistent 

Example 1. Let us consider the following example: Quakers (Qua) are Pacifists (Pac), 
Republicans (Rep) are not pacifists, Republicans are American (Am), Americans like 
Baseball (Bball), and Republicans do not like Baseball. With this knowledge base 
KBj, if a new information arrives and states that Nixon is both a Quaker and a 
Republican, it is possible to deduce that Nixon is both pacifist and not pacifist, a 
contradiction that we want to avoid. 

rLQua^Pac r2: Rep — > -iPac r3:Rep— >Am 

r4: Am ^ Bball r5: Rep — > -iBball 

If the set of input variables is {Rep, Qua}, then D„= <{},{rl,r2,r3,r4,r5}>, Dj= <{}, 
(rl, r3}>, D,= <{{Rep}}, {}> and Dp= <{{Rep, Qua}}, (r4}> for instance, are 
possible diagnoses. The computation of diagnoses will be explained in section 3. 

An armored KB is a KB on which a diagnosis has been applied. 
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Definition 2 - Armoring a Knowledge Base 

Let D(KB) be the knowledge base KB armored by D = < E^, ; D(KB) corresponds 

to KB from which the rules of r^ have been deleted and to which the integrity 
constraints of Rj,j, are added: D(KB) = Rj,j, u KB \ r^ . 

Example 1. If we consider D,, D,(KB,) = { RepAQua— >_L} u {rl, r2, r3, r5}. This 
means that, with D^(KBj), the new information "Nixon is both pacifist and Quaker" 
represented by RepAQua is forbidden; for any conjunction of input literals j that is 
not forbidden, j u {rl, r2, r3, r5} is consistent. 



3 How to Choose the Best Armoring ? 

Definition 3 - A Priori Revision 

A priori revising a KB consists in providing a preference relation on the possible 
diagnoses for a KB. 

If there are several diagnoses with the same preference, then either we can define, as 
in a posteriori revision, that a formula \|/ can be inferred from a formula tp if it can be 
inferred from all the preferred armored KB to which tp is added, or we select any 
preferred armored KB. A main difficulty is to choose among several possible 
diagnoses. We propose to use a preference ordering on diagnoses. First, we prefer and 
so only consider, minimal diagnosis. A minimal diagnosis is a diagnosis that leads to 
minimal change to the corresponding armored KB. Second, we use a penalty 
approach that provides criteria to prefer the diagnoses that reject or make useless the 
less important formulas of KB. 



3.1 Minimal Change Diagnosis 
Definition 4 - Minimal Diagnosis 

A diagnosis <Ej,, Tp> is minimal if there does not exist another diagnosis <Ep’, rj,’> 
verifying: r^’ c r^, and (E^’ c E^ or V F’ 0 E^’, 3 F QE^ such that F c F’). 

Examples. 1) For the preceding example KBj, there are 10 minimal diagnoses: 

D = <{), |rl,r3}>; D,= <{ }, (rl, r4}>; 0,= <{}, (rl, r5}>; D,= <{}, (r2, r3}>; 

D = <{), |r2,r4}>; D = <{ }, (r2, r5}>; D = <{{Rep} }, { }>; 

Dg= <{ (Rep, Qua} }, (r3}>; D,= <{ (Rep, Qua} }, (r4}>; Dj„= <{ (Rep, Qua} }, |r5}> 
2) Let us suppose that a knowledge base has the three following diagnoses: Dl= 
<{{a,b},{a, c}}, (rl}>, D2=<{|a, b}}, |rl}>, D3=<{(a }}, (rl}> 

D2 is minimal, D1 and D3 are not minimal. D1 is not minimal because it is not 
necessary to forbid the conjunction of the literals a and c to have a diagnosis; D3 is 
not minimal because D2 shows that it is not necessary to forbid all the interpretations 
satisfying a, it is sufficient to forbid the interpretations having a and b. 

Note that the minimality principle is not interesting for comparing equivalent (in 
terms of models) diagnoses. For instance, between two diagnoses Dl=<(a, b},(a. 
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-lb)), {rl)> and D2=<{{a )), {rl)>, the minimality criterion leads to prefer Dl, but 
in fact the two sets of constraints are equivalent. 

We have defined an order relation between diagnoses and the associated minimality 
criterion. However, this relation only defines a partial order that we propose to refine 
by using a penalty approach. The penalty approach can rank diagnoses by comparing 
the weights of the deleted or useless rules. 



3.2. Uselessness of a Rule in an Armored KB 

A diagnosis explicitly excludes some rules from the knowledge base. It may also 
happen that some rules become useless in the revised knowledge base because they 
can never be fired. A rule cannot be fired if its conditions correspond to an impossible 
conjunction of input literals or if some of its conditions cannot be proved after the 
deletion of rules of r^. If a rule becomes useless after application of a diagnosis, we 
can consider that the information encapsulated in this rule is, in some way, suppressed 
from the knowledge base. So, in order to compare armored KB, it is important to 
know if the rules that are kept in the armored KB are useful. 

Definition 5 - Useless Rule Set for a Diagnosis D 

Let D = < Ep, r^> be a diagnosis for KB. 

A Horn clause r is useless for D iff there is no conjunction j of input literals, 
consistent with such that the premises of r can be deduced from { j >u KB \ rj, . 
We call URS(D), useless rule set of D, the set of all useless rules of KB for D. 

Example 1. D= <{), {rl, r3)> D,(KB)={r2, r4, r5); URS(D,)={r4), r4 (Am 
Bball) is useless because Am is not an input literal, and it cannot be deduced from 
{r2, r4, r5 ) with any input base. 

Proposition : Minimality and Uselessness 

If D = < Ej,, rjj> is a diagnosis for KB, and if q in r^ is useless for D, then D is not 
minimal. 

This property means that minimality and uselessness are complementary notions to 
evaluate diagnoses. 



3.3 The Penalty Approach 

Eor any formula tp^ of the KB, there is an associated penalty a(tpj) that represents a 
degree of confidence in tPj, it will be understood as the cost that the user must pay in 
order to discard the formula (Pj. Let us present the penalty preference on diagnosis. In 
the basic penalty approach, the philosophy consists in paying a((Pj) when a formula tp; 
of the initial knowledge base is discarded, we propose to extend this by taking into 
account formulas which become useless. 
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Definition 6 - Cost of a Diagnosis 

Let D = < Ep, Tp> be a diagnosis for KB. 

The cost of the diagnosis D, called C(D), is the sum of the penalties associated to the 
rules of KB which are deleted or brought useless by D, so C(D) = ^ ccicpf) 

<Pi&ro\JURS(D) 



Definition 7 - Cost Preference 

Let KB be a penalty knowledge base. Let D, and be two minimal diagnoses of KB. 
D, is penalty-preferred to iff C(Dj) < C(D 2 ) 

Example 1. The set of input facts is {Qua, Rep}. We associate a penalty to each rule. 
rLQua^Pac = 5 r2: Rep -iPac a^=5 r3: Rep — > Am = 100 

r4:Am— >Bhall = 5 r5: Rep — ^ -iBball a, = 7 

The penalty associated to r3 means that this rule is very important. 

For each of the minimal diagnoses presented before, we give its cost; the method for 
computing these costs will be presented in the next section. 

D = <{), |rl,r3}>C(Dj)=110(r4 is useless); D = <{), jrl, r4}> C(D,)=10; 
D 3 =<{), |rl,r5}>C(D3)=12; D 4 =<{), |r2, r3}> C(D4)=110 (r4 is useless); 

D = <{}, |r2, r4}> C(D,)=10; D,= <{}, (r2, r5}> C(D,)=12; 

D,= <{ (Rep) }, { }> C(D.,)=1 17 (r2, r3, r4, r7 are useless); 

Dg= <{ (Rep, Qua} }, (r3}> C(Dg)=105 (r4 is useless); 

D = <({Rep,Qua}}, (r4}> C(D,)=5; D,„= <{ (Rep, Qua} }, (r5}> C(D,„)=7. 

So, the penalty-preferred minimal diagnosis is D,, and the associated armored KB is : 
RepAQua— >_L rl; Qua ^ Pac r2: Rep — > -iPac 

r3: Rep ^ Am r5; Rep -A -iBhall 

It means that if a person is a Quaker then he is a pacifist, and if a person is a 
Republican then he is non pacifist, american and does not like baseball. But a person 
can not be both a Quaker and a republican. 

Note that taking into account the uselessness of the rules avoids to prefer D,. The D, 
constraint means that it cannot exist republican, which makes several rules useless. 

To apply this approach, a difficulty is to obtain the penalties for all the rules. They 
can be given by an expert. If no penalty is given, each formula can be associated with 
a penalty 1, this approach is equivalent to count the number of formulas. An 
automatic approach can be to associate a penalty to each rule using heuristics. If the 
KB represents a default behavior of some components, penalties can be proportional 
to probabilities associated to a faulty component, as [de Kleer-Williams87]. 



4 Algorithms 

Two algorithms are presented. The first one computes the minimal diagnoses of a 
knowledge base. The second one determines the cost of each diagnosis. 
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4.1 Diagnosis Computation 

The computation of diagnoses can be made in two steps using first an ATMS 
[deKleer86] and second an algorithm that extends [Reiter87]. The first step computes 
the minimal characterizations of the potential inconsistencies of the KB: such a 
characterization is a conjunction of input literals and a subset of rules sufficient to 
infer a contradiction. A conjunction of input literals is also called a fact base and will 
be represented as a set of literals. Let FB={fl,. .,fn} and FB’={fT,...,f’p} be two fact 
bases, in the following we denote FB |= FB’ the fact that fl a ... a fn |= f 1a ... a fp. 
Notice that [Bouali-Loiseau95] proposes such an algorithm to debug a knowledge 
base and that [Bezzazi&al98] uses a very similar method for a posteriori revision. 

Definition 8 - Characterization 

Let KB be a knowledge base. 

A characterization is a pair <FB, rb> where FB is a set of input literals, and rbis a 
subset of KB, such that FB u rb |= L. 

A characterization is minimal iff there does not exist another characterization 
<FB’, rb’> such that rb' c rb and FB |= FB’ 

ATMS provides a way to compute for each literal a label that defines the necessary 
and sufficient condition, in terms of assumptions, that provides the deduction of the 
literal. ATMS provides also a mechanism to ensure that all parts of labels, called 
environments, are consistent. 

Definition 9 - Environment and Label 

An environment E is a conjunction of assumptions. 

The label of a literal L is a disjunction of environments (EjV.. Ej...vEJ such that : 

V Ep EjU KB L -the label is consistent with KB (except if L = L)-, V Ej, E,; E; 

E. -the label is minimal-, V Ej, Ej u KB |= L -the label is sound-, V E and E u KB |= 
L then 3 E / E |= E -the label is complete-. 

So given input literals and rules names as assumptions, and the set of rules KB 
(including for each rule its name as an additional premise), ATMS computes for each 
literal its label. We denote an environment Ei as composed of Eij;,,|^^ the rules figuring 
in Ei, and Eij^^j^ the facts figuring in Ei. So, the minimal characterizations are 
composed of the Eij^^^^ part and the Eij^^j^^ part of the environments Ei of L. 

Example 1. Assumptions: {Qua, Rep, rl, r2, r3, r4, r5}; Implications: { rl a Qua — > 
Pac, r2 A Rep —4 -iPac, r3 a Rep —4 Am, r4 a Am ^ Bball, r5 a Rep ^ -iBball} 

: Label(Pac) = (Qua a rl) Label(-iPac) = (Rep a r2) 

Label(Am) = (Rep a r3) Label(Bball) = (Rep a r3 a r4) 

Label(-iBball) = (Rep a r5) Label(Rep) = (Rep) 

Label(Qua) = (Qua) 

Label(L) = (Rep a r3 a r4 a r5)v (Qua a Rep a rl a r2). The environment E2 of 
label(L) can be noted as E2^^_,^,, = jrl, r2} and E2f,^^„, = (Qua, Rep}. There exist two 
minimal characterizations, Cl=<{Rep|, |r3, r4, r5}> and C2=<(Qua, Rep},(rl, r2}>. 

The diagnoses for a rule base can be computed from the characterizations using the 
following theorem. 
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Theorem 

Let KB be a rule base, and C = {Cl, Cn} the collection of minimal 
characterizations, D=<Ej,, rj,> is a diagnosis w.r.t KB iff 
V Ci=<E^., r^> of C, either r^j n r^ } or 3 jpl, pn} of Ej, I E^ " |pl,..,pn} 

The algorithm which computes the set of minimal diagnoses relative to a rule base 
from the minimal characterizations is an extension of the algorithm to compute 
diagnoses [Reiter87]. There are two important differences. First, in the data structure 
there are two different kinds of data taken into account: the rules and the input literals. 
A node corresponds to a characterization, to each node is associated for each rule it 
contains an arc labeled by the rule, and for each node is associated an arc labeled with 
the fact base part of the characterization. Second, the characterizations must be sorted. 
A diagnosis is obtained by keeping all the labels of arcs from root to a leaf V. 

function MinDiag(C) : a set of diagnoses 

/* C = {Cl,...,Cn} is the sorted collection of characterizations; let Q = <FBj, rj>, = 
<FB,, r>, C; < C. iff FB.= {p’ l,...,p’m} |= FB; = {pl,...,pn}; the function makes a tree 
whose nodes are labeled by some C;=<Fh, r>, and the arcs issued of Ci=<FBp r> are 
labeled by FB^ or a rule of r^ Hfb(Ni) is the set of FB that label the arcs that go from the root 
to Ni,'. Hr(Ni) is the set of rules that label the arcs that are going from the root to */ 
MinDiag : = { } 

Label the root of the Tree with the first element of C 
For each leaf N]^ of Tree labeled by an element Ck=<FB}^, rj^> 

create a node Nj ; create an arc from N]^ to Nj labeled by FB]^ 

For each r of rj^ 

create node Nj ; create an arc from N]^ to Nj labeled by 
For each node Nj created with an arc from N^ to Nj 

If 3 Ci=<FBi, ri> of C that verifies Hr(Nj) n ri = {} and 

V {fl,..,fn} of Hfb(Nj), FBi=(f ' 1, . . . , f 'm} |=/= {fl,..,fn} 

Nj := the first Ci verifying the preceding condition 

elseif 3 Nj ' = V, Hr(Nj') c Hr(Nj), and 

V {f'l,..,f'm} of HFB(Nj'), 3 {fl,..,fn} of HFB(Nj) / 

{f ' 1, . . , f 'm} 1= (fl, . . , fn} 

close Nj with X 

else close Nj with V, MinDiag : =MinDiagu{ <Hfb (Nj ) , Hr(Nj)>} 
Example 1. The schema shows how we find the diagnoses: <{ {Rep} }, {}>... 
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4.2 Diagnosis Cost Computation 

The following algorithm computes the cost of a given diagnosis. It uses the labels 
computed by ATMS during the diagnosis computation. The cost is composed 
of the cost of the rules of the diagnosis, plus the cost of the rules that are not useful 
when the integrity constraints are added and the rules of the diagnosis suppressed. 

function Cost(D; BD^tms) = integer/* Determine the cost of D */ 
Cost :=0 

/* Cost of r of r^*/ 

For each r of 

Cost := Cost + C(r) 

/*cost of useless rules*/ 

For each r of 

Suppress all environments of BDatms that contains r 
Let Red he the set of the integrity constraints Aj lij —> ± 
associated to Eq 

UpDate BDatms by adding the integrity constraints Red 
For each r of KB \ rn 

If it does not exist a non contradictory environment in BDatma 
that contains r 

Then Cost := Cost + C(r) 

Example 1 If we call Cost(Dj= <{ }, {rl, r3)>, /* Cost of r of r^*/ Cost := c(rl) 

+ c(r3) = 5 + 100 /*cost of useless rules*/ BD^^^ modified: Label(Pac) = (Qua a^); 
Label(-iPac) = (Rep a r2); Label(American) = (Rep Label(Bball) = (Rep a^*3~a 
r4); Label(-iBball) = (Rep a r5); Label(Rep) = (Rep); Label(Qua) = (Qua); Label(_L) = 
(Rep a^*3-a^4^a^)v ( Qua a Rep A^d-A^) . It does not exist a non contradictory 
environment in modified that contains r4; Cost:=105+C(r4)=110 



5 Conclusion 

This paper presents a way to armor a knowledge base, by removing some rules and 
providing some integrity constraints. The adding, in the armored KB, of a new 
information, consistent with the constraints, does not provide inconsistency, 
consequently our approach avoids to make non monotonic inference when a new 
information is added to a KB. So a priori revision is clearly not a AGM revision. 
Nevertheless it could be interesting to study the links of our approach with contraction 
[GardenforsSS]. 

In previous works [Dupin-LoiseauOO], we compared validation versus revision. The 
validation approach attempts to measure the KB quality so that, if necessary, it can 
suggest to the expert to improve it. The KB refinement is supported by such a quality 
measurement. Our new approach for a priori revision extends the notion of diagnosis 
for validation [Bouali&al97] to diagnosis for revision. The computation of possible 
diagnoses led us to make restrictions about the syntactical form of the KB and about 
the new information, these restrictions are directly inspired from the validation field. 
A point to study is to see if considering any kind of formula as new information is of 
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any interest for a priori revision. If it is the case, we must study how the algorithms 
given for a priori revision can be extended in order to deal with any knowledge base 
in propositional logic. 

We can remark that our minimality criterion is purely syntactic and does not 
recognize that different sets of constraints are equivalent. So further study can 
examine when it is interesting to propose a reformulation of the set of constraints. 
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Abstract. We propose a construction that allows to define operators for 
iterated revision from “classical” ACM revision operators. We call those 
operators revision with memory operators. We show that the operators 
obtained have nice logical properties. We illustrate this construction with 
the well-known Dalai revision operator. We also give two new particular 
revision operators based on the revision operators on OTP proposed by 
Ryan [20]. His operator do not satisfy a lot of logical properties. The two 
operators we give based on OTP satisfy all wanted revision properties. 



1 Introduction 

One of the predominant approaches to model belief change was proposed by 
Alchourron, Gardenfors and Makinson and is known as the AGM framework 
[1,10]. The core of this framework is a set of logical properties that a revision 
operator has to satisfy to guarantee a nice behaviour. 

A drawback of AGM definition of revision is that it is a static one, that 
means that, with this definition of revision operators, one can have a rational 
one step revision but the conditions for the iteration of the process are very 
weak. The problem is that AGM postulates state conditions only between the 
initial knowledge base, the new evidence and the resulting knowledge base. But 
the way to perform further revisions on the new knowledge base does not depend 
on the way the old knowledge base was revised. 

Numerous proposals have tried to state a logical characterization that ad- 
equatly models iterated belief change behaviour [8,7,5,13,17,16,12]. The more 
famous one seems to be [8] . The main idea that is common to all of those works 
is that the belief base framework is not sufficient to encompass iterated revision, 
since one needs some additional information for coding the revision policy of the 
agent. So the need of epistemic states to encode the agent “state of mind” is 
widely accepted. An epistemic state allows to code agent’s beliefs but also to 
code its relative confidence in alternative possible states of the world. Epistemic 
states can be represented by several means: pre-orders on interpretations [8,13], 
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conditionals [5,8], epistemic entrenchments [21,16], prioritized belief bases [2,3], 
etc. In this paper we will focus on the representation of epistemic states in terms 
of pre-orders on interpretations. 

What we propose in this paper is not yet an other logical characterization, 
but the definition of a family of operators, that we call revision with memory 
operators, that aims to have good iteration properties. 

Dalal-like revision operators are sometimes decry for their a priori, extra- 
logical information which represents the distance that they use to order inter- 
pretations. We give two operators derived from Ryan’s OTP revision operator 
[20] . We will see that Ryan’s operator does not satisfy the wanted logical proper- 
ties and give two modifications of Ryan OTP revision operator that will. These 
three operators are interesting since, conversely to Dalal-like operators, there is 
no a priori distance. This information is provided by the formulae themselves in 
a very natural (syntactical) way. 

In section 2 we recall the logical characterization of Darwiche and Pearl. In 
section 3 we give the definition of revision with memory operators and state the 
general logical results. Then, in section 4 we provide five examples of operators. 
Apart from Ryan operator (section 4.3), that is not a revision with memory 
operator, the four other operators have nice logical properties. Three of them 
are, as far as we know, new operators. We conclude in section 5 by some general 
remarks. 

2 Iterated Revision Postulates 

We give here a formulation of AGM postulates for belief revision d la Katsuno 
and Mendelzon [11]. More exactly we give a formulation of these postulates in 
terms of epistemic states [8] . The epistemic states framework is an extension of 
the belief bases one. Intuitively an epistemic state can be seen as a composed 
information: the beliefs of the agent, plus all information that agent needs about 
how to perform revision (preference ordering, conditionals, etc.). Then we give 
the additional iteration postulates proposed by Darwiche and Pearl [8]. 

2.1 Formal Preliminaries 

We will work in the finite propositional case. A belief base (/? is a set of formulae, 
which can be considered as the formula that is the conjunction of its formulae. 

The set of all interpretations is denoted W. Let be a formula, Mod{ip) 
denotes the set of models of ip, i.e. Mod{(p) = {/ G W : I \= ip}. 

A pre-order < is a reflexive and transitive relation, and < is its strict coun- 
terpart, i.e / < J if and only if / < J and J ^ /. As usual, ~ is defined by / ~ J 
iff / < J and J < I. 

To each epistemic state d' is associated a belief base Bel{df) which is a 
propositional formula and which represents the objective (logical) part of dr. 
The models of dr are the models of its associated belief base, thus Mod{dr) = 
Mod{Bel{d/)). Let d/ be an epistemic state and ^ be a sentence denoting the 
new information, dr o p denotes the epistemic state resulting of the revision of dr 
by p. For reading convenience we will write respectively dr \- p, dr f\ p and I \= dr 
instead of Bel{dr) h p, Bel{dr) A p and I \= Bel{d/). 
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Two epistemic states are equivalent, noted W = W', if and only if their ob- 
jective parts are equivalent formulae, i.e. O Bel{'P'). Two epistemic 

states are equal, noted 'F — W, if and only if they are identical. Thus equality is 
stronger than equivalence. In fact equivalence denotes a static equivalence, since 
after a belief change, the two epistemic states can lead to very different ones, 
whereas equality denotes a dynamic equivalence between epistemic states, since 
all sequences of belief change perform on these two epistemic states will lead to 
two equal epistemic states^. 

2.2 AGM Postulates for Epistemic States 

Let 'F be an epistemic state and fi and (p be formulae. An operator o that maps 
an epistemic state W and a formula /i to an epistemic state 'F o jj, is said to be a 
revision operator on epistemic states if it satisfies the following postulates [8] : 

(R*l) 

(R*2) If A ^ F T, then o /x O 'f' A /r 
(R*3) If /X F T, then if' o /x F T 

(R*4) If <Fi='F 2 and then o = if '2 o ^2 

(R*5) (if' o /x) A h If' o (/X A (^) 

(R*6) If {F o n) A ip !r^ ±, then F o (fj, A p) \~ {F o p,) A p 

This is nearly the Katsuno and Mendelzon formulation of AGM postulates 
[11], the only differences are that we work with epistemic states instead of belief 
bases and that postulate (R*4) is weaker than its AGM counterpart. See [8] for 
a full motivation of this definition. 

A representation theorem, stating how revisions can be characterized in terms 
of pre-orders on interpretations, holds. In order to give such semantical repre- 
sentation, the concept of faithful assignment on epistemic states is defined. 

Definition 1. A function that maps each epistemic state F to a pre- order 
on interpretations is called a faithful assignment over epistemic states if and 
only if: 

1. If I \=F and J \= F, then I J 

2. If I \=F and J ^ F, then I J 

3. IfFi = F 2 , then <^^=< 1^2 

Now the reformulation of Katsuno and Mendelzon [11] representation theo- 
rem in terms of epistemic states is: 

Theorem 1 A revision operator o satisfies postulates (R*l-R*6) if and only if 
there exists a faithful assignment that maps each epistemic state F to a total 
pre-order such that: 

Mod{F o p) = mm{M od{p) , <^) 

Notice that this theorem gives information only on the objective part of the 
resulting epis temic state. 

^ note that F — F' implies F = F' . 
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2.3 Darwiche and Pearl Postulates 

A strong limitation of AGM revision postulates is that they impose very weak 
constraints on the iteration of the revision process. Darwiche and Pearl [7,8] 
proposed postulates for iterated revision. The aim of these postulates is to keep 
as much as possible of conditional beliefs^ of the old belief base. So, besides 
postulates (R*l-R*6), a revision operator has to satisfy: 

(Cl) If (p h then {]Pofj,)o(p = ^o(p 
(C2) li (p\ — ifj,, then = 

(C3) If S' o h /i, then {W o y) o Lp \- jj, 

(C4) If S' o 1 ^ P -i/i, then (if' o /i) o 1 ^ P -i/i 

These postulates can be explained as follows: (Cl) states that if two pieces of 
information arrive and if the second implies the first, the second alone would give 
the same belief base. (C2) says that when two contradictory pieces of information 
arrive, the second alone would give the same belief base. (C3) states that an 
information should be retained after revising by a second information such that, 
when revising the current belief base by it, the first one holds. (C4) says that no 
piece of information can contribute to its own denial. 

3 Building Memory Operators from “Classical” AGM 
Ones 

A “classical” AGM revision operator is equivalent to a faithful assignment over 
belief bases as stated in the following theorem [11]. 

Definition 2. A function that maps each belief base ip to a pre-order <<^ on 
interpretations is called a faithful assignment over belief bases if and only if: 

1. If I \= ip and J \= p>, then I J 

2. If I \= (f and J ^ (p, then I J 

3. If ip I O (p 2 , then 



Theorem 2 A revision operator o satisfies “classical” AGM postulates (Rl- 
R6)^ if and only if there exists a faithful assignment (over belief bases) that 
maps each belief base ip to a total pre-order such that: 

Mod{(p o n) = min(Mod(/i), <^) 

So one can define a revision operator directly by defining the correspond- 
ing faithful assignment over belief bases. It is the case for most distance-based 
revision operators such as Dalai operator for example [6,11]. 

More precisely we say that a revision operator o is defined from a distance d 
iff the following conditions hold: 

^ a conditional belief can be expressed as “if p, would be the case, then p must be 
true” 

^ it is the same set of postulates than (R*l-R*6) but expressed for belief bases instead 
of belief states (c/ [11]). 




502 



S. Konieczny and R. Pino Perez 



— fi is a distance, that is d is a function d : W x W i— M+ that satisfies: 

d(I, J) = d( J, I) and d{I, J) = 0 iff / = J. 

— Then the distance between an interpretation / and a belief base ip is defined 

as: d(I, p) = min {d(7, J) : J \= p} 

— This distance induces a faithful assignment: / J iff d{I,p) < d{J,p) 

— And the revision operator is defined by Mod{p o p) = min(Mod(/i), <<^) 

One can check that the assignment obtained like this is a faithful assignment 
and thus that all operators defined in this way satisfy AGM postulates. It can 
also be easily checked that operators defined in this way do not satisfy a lot of 
iterated revision postulates. 

Now we will give a construction that allows, from a given faithful assignment 
{i.e. from a given “classical” revision operator), to define an other revision oper- 
ator that satisfy AGM postulates but also most of iterated revision postulates. 

First, let us notice that an epistemic state can be represented by a total pre- 
order on interpretations as suggested by theorem 1 and by several related works 
(c/e.g [8,3]). So, with this particular representation, that is if we identify the 
epistemic state with a pre-order <^, the belief base Bel{^) is simply the for- 
mula whose models are minimal for the pre-order, that is Bel{^) = min(W, <ij/). 
And the other interpretations are ordered according to their relative plausibility 
for the agent. For example I J means that the agent that is in the epistemic 
state iF consider / as more plausible than J. It is this preferential information 
that can be used to encompass the iterated revision behaviour, by considering 
revision operators as functions that maps a pre-order (epistemic state) and a 
formula (new information) into a new pre-order (epistemic state). This idea is 
the mainstay in most of iterated revision works [21,8,16]. 

So using this representation by means of pre-orders on interpretations and 
theorem 1 we will define a familly of revision operators as follows: 

Definition 3. Suppose that we dispose of a function that maps each belief base 
p to a pre-order <ip- Then we define the epistemic state (the pre-order) Wop 
result of the revision ofW by the new information p as: 

J J ^ff 1 J or 

I J and I <q, J 

Then one can check that: 

Theorem 3 If the function that maps each belief base p to a total pre-order <<p 
is a faithful assignment over belief bases, then the revision operator on epistemic 
states defined in definition 3 satisfies postulates (R*l-R*6). We will call revision 
operators with memory those operators. 

So with definition 3, one can start from any epistemic state (total pre-order 
over interpretations) and carry on iterated revisions. A particular epistemic state 
we can mention is the “empty” epistemic state, where the agent has no belief 
and no preferential information, that is such that V/, J / ~ J. We will note Ei 
this epistemic state. So the objective part of this epistemic state is Bel{E) = T. 
It can be considered as the epistemic state generalisation of T for the belief base 
framework, since they are both neutral elements for the corresponding operators: 
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ip o S — (as 1 ^ o T = (p in the belief base framework). One can consider that 
all agents start with this epistemic sate (we will consider this in the examples). 

In fact the family defined is more specific than that, since there are more 
properties that are satisfied by those operators: 

(H4) If 1 ]/^ = ^^2 and O /X 2 , then if'i o /n = if '2 ° ^2 
(C) If A ^ is satisfiable, then 'I'o(pon = 'I'o{(p/\^) 

(H4) is a strenghten of (R*4) . (C) states that when one revises successively by 
two consistent pieces of information, it amounts to revise by their conjunction. 
It is close to a postulate proposed by Nayak and al. [17] called Conjunction, 
but (C) is weaker than Conjunction, since it requires only the equivalence of 
the two resulting epistemic states, not the equality. See [12] for a full logical 
characterization of revision with memory operators. 

Concerning iteration postulates stated by Darwiche and Pearl [8]: 

Theorem 4 Revision operators with memory satisfy postulates (G1),(G3) and 
(C4). 

It can be also easily checked that (C2) is satisfied by a unique operator with 
memory, since it demands (in the presence of the other revision postulates), that 
the pre-order associated to a belief base by the faithful assignment on belief base 
used in definition 3 is a two-level pre-order with the models of the belief base at 
the lowest level and the counter-models at the higher one. This operator will be 
presented in the next section. 

So most of our revision with memory operators do not satisfy (C2). But we 
do not consider this as a drawback. We rather think that it is (C2) that is not 
fully satisfactory. 

In fact, in [7] the set of postulates (C1-C4) has first been given as a com- 
plement to usual “classical” ACM postulates. Freund and Lehmann [9] have 
shown that (C2) is inconsistent with those postulates. Furthermore Lehmann 
[13] has shown that (Cl) plus ACM postulates imply (C3) and (C4). In [8] 
Darwiche and Pearl have rephrased their postulates (and ACM ones) in terms 
of epistemic states instead of belief bases, and thus have removed these logical 
contradictions. 

But we do not think that it is enough to requalify (C2) and we think that 
satisfy (C2) can lead to counterintuitive results. Consider the following example: 

Example 1 Consider a cireuit eontaining an adder and a multiplier. In this 
example we have two atomic propositions, adder_ok and multiplier_ok, denoting 
respectively the fact that the adder and the multiplier are working. We have 
initially no information about this circuit (fP — S) and we learn that the adder 
and the multiplier are working {p, = adder_ok A multiplier _ok) . Then someone 
tells us that the adder is not working {ip = ~> adder_ok). There is, then, no reason 
to ‘forget” that the multiplier is working, which is imposed by (C2): ip ^ -■/i so 
by ( G2) we have W o p o p = {T o p) = p. 

This example is a slight modification of an example given in [8]. So, in some 
cases, postulates (C2) induces exactly the same kind of bad behaviour it tries to 
prevent. 
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4 Some Revision with Memory Operators 

4.1 Basic Memory Operator 

Let us define the assignment that maps each belief base to a pre-order in the 
following way: 

Definition 4. / <^ J if and only if I \= (p or 

I ^ p and J ^ ip 

So we have what we shall call a basic order, which is a two-level order (at 
most), with the models of p at the lower level and the other worlds at the higher 
level. 



Definition 5. The basic memory operator is the memory operator obtained 
from this assignment (i.e. the operator obtained by definitions 4 and 3). 

Even with this basic order on belief bases, one can build very complex epis- 
temic states. This is due to revision memory. We illustrate the behaviour of this 
operator through some simple examples. 

Example 2 Consider a language C with only two propositional letters a and b. 
We will denote interpretations simply by the truth assignment, i.e 10 denotes 
the interpretation mapping a to true and b to false. Two interpretations are 
equivalent, with respect to the pre-order, if they appear at the same level. An 
interpretation I is better than another interpretation J (I < J) if it appears at 
a lower level. Let us see some examples of epistemic states: 
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The assignment defined is a faithful assignment on belief bases, with theorems 
3 and 4, it is easy to show that: 

Theorem 5 The only revision operator with memory that satisfies (R*l-R*6) 
and (CTC4) is the basic memory revision operator. 

This operator has been already studied in the litterature under different par- 
ticular representations: in [16] with epistemic entrenchments, in [2] with polyno- 
mials and syntactic belief bases. Finally, we can note that Liberatore has shown 
[15] that several problems are computationally simpler for the basic memory op- 
erator than for the other iterated belief revision proposals (including Boutilier’s 
natural revision [4], Lehmann’s ranking revision [13] and Williams’ transmuta- 
tions [21]). 
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4.2 Dalai Memory Operator 

We use in this section the Hamming distance between interpretations'^ and then 
the Dalai distance between an interpretation I and a belief base (f is defined as 
d(J, ip) = min J)). 

Let’s define the assignment that maps each belief base to a pre-order in the 
following way: 

Definition 6. / <^ J if and only ifd{I,ip) < d{J,(p). 

So we have a pre-order with the models of ip at the lowest level and the other 
worlds in the higher levels. 

Definition 7. The Dalai memory operator is the memory operator obtained 
from this assignment (i.e. the operator obtained by definitions 6 and 3). 

We can show on a toy example that this operator differs from classical Dalai 
revision operator [6,11]. Let a and b be two propositional letters and consider for 
example the sequence T — S'oao6o-i(aA6). The classical Dalai operator gives 
Bel{T) = (a A -•b) V {-^a A b). Whereas Dalai memory operator gives Bel{T) = 
{-'a A b). This behaviour seems more natural since at the penultimate step we 
learnt that b was true, and it is normal to keep some credit for this evidence in 
the following step. It is in this way, that our operators use revision “memory”. 

4.3 Ryan OTP Operator 

Mark Ryan has proposed to apply his Ordered Presentations of Theories (or 
OTP) to belief revision [20]. Very roughly, an OTP is a multi-set of formulae 
equipped with a partial pre-order. This pre-order represents the relative relia- 
bility of the sources of each formula. So, using a linear order, one can express 
the fact that the new information is more reliable than older ones and thus can 
simulate iterated revisions. To give the definition of OTP is not a subject of this 
work, the interested reader can see e.g [19]. We will simply introduce the notions 
needed to define the OTP revision operator. 

First we have to define what the monotonicities of a formula are. 

Definition 8. Let I be an interpretation and p be a propositional letter, then 
/[p] (respectively /[“'Plj denotes the interpretation that is identical to I on each 
propositional letter except (maybe) on the propositional letter p that is assigned 
to true (resp. false). 



Definition 9. Let ip be a consistent formula and p be a propositional letter. 

1. ip is monotonic in p if I \= ip implies that L^^ ^ ip. 

2. ip is anti-monotonic in p if L \= ip implies that ]= ip. 

the Hamming distance between two interpretations is the number of propositional 
letters on which the two interpretations differ 
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The set of symbols in which cp is monotonic (resp. anti-monotonic) is noted 
ip'^ (resp. p~). If (/? O _L, then (/?+ = p~ = 0. 

After this definition, Ryan defines an inference relation that he named natural 
entailment. 

Definition 10. p naturally entails p, written p p, if p\~ p, p~^ C /i+ and 

p~ Cp~. 

This relation has some nice properties, in particular it does not allow to add 
irrelevant disjuncts in the conclusions (for example p pVg). See [19] for more 
details. 

Finally, the preference relation associated with a formula p is given by the 
set of natural consequences that the interpretations satisfy, that is: 

Definition 11. Let p be a formula, and I, J two interpretations, the relation 
is defined as: I J if for each p such that p p, (J \= p ^ I \= p) 
holds. 

So an interpretation is better than another if it satisfies more natural conse- 
quences. Note that the relation is a partial pre-order. 

Definition 12. The Ryan operator is the operator obtained from this assign- 
ment (i.e. the operator obtained by definitions 11 and 3). 

Because the starting assignment takes partial pre-orders as values, Ryan 
operator does not satisfy all the postulates. More precisely, one has the following 
result [20]: 

Theorem 6 The Ryan revision operator satisfies postulates (R*l), (R*3), 
(R*4), and (R*5), but does not satisfy (R*2) and (R*6). 

A counter-example to (R*2) and (R*6), given in [20], is the following: 

Let Pi = p V q V r, p 2 = ~'p A ~<q A ~ir and p^ = {p ^ q) A ~<r. Then 
for (R*2), take iF = S' o o (^2 and p = p^. Then Mod{T) = {Oil, 101, 110} 
and Mod{p) = {000,001,010,100,110,111}, so Mod{T A p) = {110} whereas 
Mod{\T o p) = {110,001}. The same counter-example holds for (R*6) also by 
putting T = El o pi, p = p 2 and p = p^. 

These two violations of the rationality postulates seem to be very awkward. 
Especially (R*2) seems hardly debatable. The question is: how can we modify 
Ryan’s definition in order to satisfy these properties ? In fact, the easiest way 
to modify this operator in order to obtain revision with memory operators is to 
“complete” the :<,p partial pre-orders in order to obtain total pre-orders. This 
can be achieved by two means that give the two following operators. 

4.4 Closure of the Pre-order 

First, following the construction of the rational closure of a conditional belief 
base [14] (see also Pearl’s System Z [18]), we can figure out a lazy deformation 
of the pre-order, that is, the deformation that transforms the partial pre-order 
in a total pre-order with a minimal effort. 
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Definition 13. Let p^p{I) he the “distance from I to ip” in the following sense: 

1. If I G min(W, then p^{I) = 0, 

2. Otherwise p^{I) = a, where a is the length of the longest chain of strict 
inequalities Iq -<^p ■ ■ ■ In with /q G min(>V, <ip) and In = I- 

This “distance” gives a total pre-order on interpretations: 

Definition 14. / J if and only if Pip{l) < p^{J). 

We illustrate this principle of “minimal effort” with an example: Let tp = 
(-■a V -•b) A -ic be a belief base. 
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Fig. 1 . Closure 
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the pre-order 



The left hand side presents the partial pre-order Arrows / <— J denote 
I <ip J (for reading convenience we do not represent transitivity, reffexivity and 
the equivalence between minimal interpretations). The right hand side presents 
the corresponding pre-order. It is clear that if / J then I J. 

Thus the only interpretation that is not straightforwardly placed is 110. The 
“minimal effort” is being illustrated here as follows: the first place where can 
be placed 001 is at the second level, so it is the chosen level. Conversely, for 
the interpretation Oil for example, the first “acceptable” level is the third one 
because there is an interpretation (001) that is strictly better than 011 which is 
occupying the second level. 

It is easy to show that the function that maps each belief base p to a, total 
pre-order is a faithful assignement. Then we build our memory operator 

as usual: 

Definition 15. The OTPi memory operator is the memory operator obtained 
from this assignment (i.e. the operator obtained by definitions 14 and 3). 

4.5 Using Cardinalities 

A second way to define a total pre-order from Ryan revision operator is to 
interpret it differently. The idea of the pftp order, defining Ryan operator, is that 
an interpretation I is better than another J for a belief base if J satisfies all 
the natural consequences that J satisfies. In other terms I is better than J if 
/ satisfies more natural consequences than J. Following this idea we can then 
focus uniquely on the number of natural consequences satisfied. 
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Fig. 2. Behaviour differences between revision with memory operators 



Definition 16. / J if and only if card{{ii \ ip p, and J ^ 1 ^}) < 

card{{p I p /i, and I ^ p}). 



This definition is also a “completion” of the :<^p pre-order since if / J, 
then / J. 

Then, as usual: 



Definition 17. The OTPi memory operator is the memory operator obtained 
from this assignment (i.e. the operator obtained by definitions I 4 and 3). 



5 Conclusion 

We will end by showing that the four revision with memory operators defined 
are differents. To show that, it is enough to show that the corresponding faithful 
assignments are differents. We will show that on the formula p = {-<a V -~'b) A c. 
In figure 2 one can check that the four pre-orders obtained are different. 

We have proposed in this paper a method to build revision operators that 
have interesting properties for iterated revision from any classical AGM operator. 

This family of operators exhibits the fact that Darwiche and Pearl’s (C2) 
postulate is certainly too demanding. 

We have also introduced two new operators based on Ryan revision operator 
[20]. An open question is to know if those operators can be recovered from the 
definition of a classical distance-based revision operator. 

We have mainly deal in this paper with the generic construction of iterated 
revision operators from classical AGM operators, but a full logical characteriza- 
tion of revision with memory operators can be found in [12]. 
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Abstract. Belief change scenarios were recently introduced as a framework for 
expressing different forms of belief change. In this paper, we show how belief 
revision and belief contraction (within belief change scenarios) can be axioma- 
tised by means of quantified Boolean formulas. This approach has several ben- 
efits. First, it furnishes an axiomatic specification of belief change within belief 
change scenarios. Second, this axiomatisation allows us to identify upper bounds 
for the complexity of revision and contraction within belief change scenarios. We 
strengthen these upper bounds by providing strict complexity results for the con- 
sidered reasoning tasks. Finally, we obtain an implementation of different forms 
of belief change by appeal to the existing system QUIP. 



1 Introduction 

In [3] a consistency-based framework for expressing belief change is developed. The 
essential idea with respect to revision is that, given a knowledge base K and a sentence 
for revision a, we express K and a in disjoint languages; coerce (via a maximisation 
process) the languages to agree on truth values of atoms wherever consistently possible; 
and finally re-express the result in a single language. The inherent non-determinism of 
the maximisation process gives rise to two notions of revision. In choice revision one 
such “extension” is selected for the revised state. In general (skeptical) revision, the 
revised state consists of the intersection of all such extensions. Belief contraction is 
defined similarly. 

In this paper, we discuss a method to implement this approach to belief change, 
based on reductions to quantified Boolean formulas. By a quantified Boolean formula 
(or “QBF” for short) one understands a term which is constructed like an ordinary 
propositional formula, except that quantifiers ranging over propositional variables may 
also occur. Quantified Boolean formulas belong thus to the language of second-order 
logic and allow a compact representation of a large class of properties. Indeed, the 
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latter is reflected by the fact that the evaluation problem of QBFs — i.e., the problem 
of determining the truth of a given QBF — is PSPACE-complete, whilst the evaluation 
problem of QBFs having prenex normal form with i alternating quantifiers is complete 
for the i-th level of the polynomial hierarchy [17; 18]. 

The general mechanism of our approach is to translate (in a polynomial way) a given 
reasoning task into the evaluation problem for QBFs and then using a sophisticated QBF- 
evaluator to compute the resultant instances. The existence of efficient QBF-solvers, such 
as the systems developed by Cadoli et al. [2], Kleine Biining et al. [12], Rintanen [16], 
or Feldmann et al. [8], makes such a rapid prototyping approach practically applicable. 

A similar approach for solving various reasoning tasks belonging to the area of 
nonmonotonic reasoning have been realized in the system QUIP [7; 6; 5]. This proto- 
typical implementation currently handles the computation of the main reasoning tasks 
for logic-based abduction, default logic, several types of modal nonmonotonic logics, 
and disjunctive logic programs under the stable model semantics. We implemented the 
translations for belief change problems by incorporating them into the system QUIP. 

Reduction methods to QBFs naturally generalize similar approaches for problems 
in NP; these latter problems can in turn be solved by translating them (in polynomial 
time) to SAT, the satisfiability problem of classical propositional logic (see e.g., [11] for 
such an application in Artificial Intelligence). Besides the implementation of different 
nonmonotonic reasoning tasks as realized by the system QUIP, successful applications 
based on reductions to QBFs have also been applied to conditional planning [15]. 

The expression of belief change problems in terms of QBFs also allows the derivation 
of upper complexity bounds for the considered reasoning tasks. Moreover, we generalise 
and improve on the results in [3]. 

In the next section we briefly introduce QBFs, and describe the belief change scenar- 
ios that interest us. In Section 3 we give the polynomial-time constructible reductions 
of the relevant reasoning tasks into QBFs. Section 4 briefly sketches an implementation 
of the reductions, and Section 5 supplies some concluding remarks. 



2 Background 

2.1 Logical Prerequisites 

We deal with propositional languages and use the logical symbols T, _L, -i, V, A, — 
and = to construct formulas in the standard way. We write C-p to denote a language over 
an alphabet V of propositional letters or atomic propositions. Formulas are denoted 
by Greek lower-case letters (possibly with subscripts). Knowledge bases, or, equiva- 
lently, belief sets, are initially identified with deductively-closed sets of formulas (we 
use K,Ki , ... to denote knowledge bases). So, we have K = Cn{K), where Cnf) 
is the deductive closure of the formula or set of formulas given as argument. Later we 
relax this restriction. 

Given an alphabet V, we define a disjoint alphabet V as V = {p' \ p £ V}. Then, 
for a £ Cp , we define a' as the result of replacing in a each atom p from V by the 
corresponding atom p' in V' (so implicitly there is an isomorphism between V and V'). 
This is defined analogously for sets of formulas. 
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Quantified Boolean formulas (QBFs) generalize ordinary propositional formulas 
by the admission of quantifications over propositional variables (QBFs are denoted by 
Greek upper-case letters). Informally, a QBF of the form Vp 3q means that for all truth 
assignments of p there is a truth assignment of q such that is true. For instance, it is 
easily seen that the QBF 3pi 3p2 ((pi 
rightarrowp 2 ) A Vps^ps 
rightarrowp 2 )) evaluates to true. 

The precise semantical meaning of QBFs is defined as follows. First, some ancillary 
notation. An occurrence of a variable v in a QBF ‘P is free iff it does not appear in 
the scope of a quantifier Qv (Q G {V, 3}), otherwise the occurrence of v is bound. 
If contains no free variables, then is closed, otherwise <P is open. Furthermore, 
•Plvi/fi, ... , Vn/fn] denotes the result of uniformly substituting the free variables Vi 
in <P by formulas fi (1 < i < n). 

By an interpretation, M, we understand a set of variables. Informally, a variable v 
is true under M iff n G M. In general, the truth value, vm{^), of a QBF under an 
interpretation M is recursively defined as follows: 

1. if <? = T, then vm{^) = 1; 

2. if <? = _L, then = 0; 

3. if <? = u is a variable, then = 1 if u G M, and = 0 otherwise; 

4. if <? = then = 1 — 

5. if = (^1 A <p 2 ), then vm{^) = i^m(^ 2 )}); 

6. if<P= {<Pi V <p 2 ), then vm{^) = max{{vM{‘d’i),VM{^ 2 )})\ 

7. if <? = (<?i -G <l> 2 ), then vm{^) = 1 iff < J^m(^ 2 ); 

8. if ^ = Vu 'P, then vm{^) = vm{P[v/T] A P[v/1])', 

9. if ^ = 3u P, then vm{^) = vm{P[v/^] V P[v/1]). 

We say that P is true under M iff vm{P) = 1, otherwise P is false under M. If 
= 1, then M is a model of P. Likewise, for a set S of formulas, if vm{P) = 1 for 
all P G S, then M is a model of S. If P has some model, then P is said to be satisfiable. 
If P is true under any model, then P is valid. Observe that a closed QBF is either valid 
or unsatisfiable, because closed QBFs are either true under each interpretation or false 
under each interpretation. Hence, for closed QBFs, there is no need to refer to particular 
interpretations. 

In the sequel, we use the following abbreviations in the context of QBFs: Let 
S = {4>x , . . . , (j>n} and T = {fi , . . . , f/'n} be indexed sets of formulas. Then, S < T 
abbreviates Ar=i('^i S' = T is a shorthand for Ar=i('^j = '*/'*)■ Further- 

more, for a set P = {pi , . . . , p„} of propositional variables and a quantifier Q G {V, 3}, 
we let QP P stand for the formula QpiQp 2 ■ ■ ■ Qpn ‘P- Additionally, for each variable v 
occurring in some formula, Veq denotes a globally new variable. Accordingly, for a set 
V of variables, we define Veq = {veq \ v G V}. Finally, finite sets T = {fi, . . . , (/)„} 
of formulas are usually identified with the conjunction AT=i 4>i of its elements. 

The operator < is a fundamental tool for expressing tests on sets of formulas which 
are required to satisfy certain conditions. In particular, we use < here in connection with 
the following task: Given finite sets T and P = {fi, ... ,<j>n} of formulas, we want to 
compute all subsets R C P such that T U i? is consistent. 
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This problem can be expressed by the QBF 

% = 3V{T ^{G < P)), 

where V is the set of all variables occurring in T or P, and G = {pi, . . . , p„} is a set of 
new variables. 

Note that G constitutes the set of free variables of ^<. These variables facilitate 
the selection of those elements of P which determine the sets R such that T U i? is 
consistent. More precisely, we have the following properties: 

- If M is a model of tP<, then T U {(pi \ gi G M, 1 < i < n} is consistent. 

- If T U i? is consistent, with R Q P, then {gi | G i?, 1 < z < n} is a model 
of 

Let us illustrate the first of these properties. Consider a model M of 

= 3V A (^1 — >■ </>i) A (p2 <p2) A ... A {gn — >■ (pn)^ ■ 

Clearly, under M, each conjunct {gi — >■ pi) evaluates to true if gi ^ M, and reduces to 
Pi otherwise. Hence, can be transformed into 

3c(ta f\ (1) 

gi&M 

Obviously, (1) is true under M iff T U {pi \ gi G M} is consistent. 

Note that <?< is constructed to express all subsets R C P such that T U i? is 
consistent. To express, for example, all maximal such subsets, some additional elements 
are required. The computation of maximal sets satisfying certain criteria, using QBFs, 
is discussed in Section 3. 



2.2 Belief Change Scenarios 

Following Delgrande and Schaub [3], we define a belief change scenario in language C-p 
as a friple B = {K, Ui,U 2 ), where K, Ui, U 2 are sefs of formulas in Cp. Informally, K 
is a knowledge base thaf will be changed such fhaf the set Ui will be true in the result, 
and the set U 2 will be consistent with the result. For a base approach to revision we take 
U 2 =% and for a base approach to contraction we take f7i = 0. 

In the definition below, “maximal” is with respect to set containment (rather than set 
cardinality). The following definition is central: 

Definition 1. Let B = {K, Ui, U 2 ) be a belief change scenario in Cp. Define EQ as a 
maximal set of equivalences EQ C {p = p' \ p G P} such that 

K'UEQUUiLlU2f -L. 



Then, 



Cn{K' \JEQ\JUi)fiCp 



is a consistent definitional extension of B. 
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Table 1. (Skeptical) revision examples. 



K' 


a 


EQ 


K+a 


P' A q' 


^q 


{P = P'} 


> 

J 


-.p' = q' 




III 

III 


pA -.g 


p'yq' 


-^py ^q 


III 

III 


p=^q 


p' A g' 


“ip V -ig 


\{p = p'}, {q = q'}\ 


p=^q 



Hence, a consistent definitional extension of i? is a modification of K in which 
Ui is true, and in which U 2 is consistent. We say that EQ underlies the consis- 
tent definitional extension of B. In the sequel, we restrict the sets of equivalences to 
{p = p' \ p occurs in itT U f/i U C/ 2 }. Clearly, for a given belief change scenario there 
may be more than one consistent definitional extension. 

In Definition 2 and 3 below, we make use of the notion of a selection function, c, 
that for any set / y/ 0 has as value some element of /. These primitive functions can 
be regarded as inducing selection functions c' on belief change scenarios, such that 
c'{{K, Ui, C/ 2 )) has as value some consistent definitional extension of {K, U\, C/ 2 ). This 
is a slight generalisation of selection functions as found in the AGM approach [10]. 

Definition 1 provides a very general framework for specifying belief change. In the 
next two definitions we give specific definitions for the belief-change operations revision 
and contraction. 

Definition 2 (Revision). Let K be a knowledge base and a a formula, and let 
be the family of all consistent definitional extensions of{K, {a}, 0). Then: 

1. K+ca = Ei is a choice revision ofK by a with respect to some selection function 
c with c{I) = i. 

2. K+a = Hie/ (skeptical) revision ofK by a. 

Table 1 gives examples of (skeptical) revision. The first column gives the original knowl- 
edge base, but with atoms already renamed. The second column gives the revision for- 
mula, while the third gives the EQ set(s) and the last column gives the results of the 
revision. For the first and last column, we give a formula whose deductive closure gives 
the corresponding belief set. 

In detail, for the last example, we wish to determine 

{p A g|-i-(-'p V-ig) . 

We find maximal sets EQ C {p = p' ,q = g'} such that 

Ip' a q'} UEQU {-.p V -igr} U 0 

is consistent. We get two such sets of equivalences, namely EQi = {p = p'} and 
EQ 2 = {q = q'}- Accordingly, we obtain 

{p A g|-i-(-'p V -ig) = ^ q'}'JEQi U {-■p V -ig}) n£-p. 

In addition to (-ip V -■g), we get (p V g), jointly implying (p = -ig). 

Contraction is defined similarly to revision. 
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Table 2. (Skeptical) contraction examples. 



K' 


a 


EQ 


K^a 


p' A q' 


q 


{P = P'} 


P 


< 

< 


py q 


{r = r'} 


r 


p'Vq' 


pAq 


{p = p', q = q'} 


py q 


p' A q' 


pAq 


{p = p'}, {<? = q'} 


py q 



Definition 3 (Contraction). Let K be a knowledge base and a a formula, and let 
be the family of all consistent definitional extensions of{K, 0, {-la}). Then: 

1. K—cd = Ei is a choice contraction of K by a with respect to some selection 
function c with c{I) = i. 

2. K—a = Hie/ (skeptical) contraction of K by a. 

Table 2 gives examples of (skeptical) contraction, using the same format and conventions 
as Table 1 . 

In detail, for the first example we wish to determine 

{pLq}-q . 

We compute the consistent definitional extensions of ({p A q}, 0, {“■<?})■ We rename the 
propositions in {p A q} and look for maximal subsets EQ of {p = p' ,q = q'} such that 

{p Aq}UEQU(hU {^q} 

is consistent. We obtain EQ = {p = p'}, yielding 

{p A q}-q = Cn{{p' A q'} U {p = p'} U 0) Cl £-p 



2.3 Related Work 

The approach of the previous subsection combines aspects of general (coherence-based) 
belief change with aspects of belief base revision [13]. In belief base revision, a knowl- 
edge base is an (arbitrary, syntactic) set of formulas that is to be modified, and that 
represents or characterises the logical closure of this set of formulas. Here, we allow 
such a syntactic characterisation of knowledge bases. As well, since belief change is 
phrased in terms of a set of syntactically-distinguished sentences, here the set of atomic 
sentences, it also resembles base revision. However, in [3] it is shown that this approach 
(essentially) satisfies the AGM revision and contraction postulates [1], with the excep- 
tion of the revision postulate (iT-j-8) and the contraction postulate (K—8) . In particular, 
and in contrast to most approaches to belief base revision, it satisfies the postulate of 
irrelevance of syntax, in that the results of belief change do not depend on the syntactic 
expression of sentences in a belief change scenario. 
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We note also that the approach is capable of expressing multiple contraction [9] 
wherein, for belief change scenario {K, Ui, U 2 ), every element of U 2 would be indi- 
vidually consistent with the resulting knowledge base. However, we do not pursue this 
generalisation here; rather, we assume that the elements of U 2 will be jointly consistent 
in the resulting knowledge base. 

3 Reductions 

In this section, we present efficient (polynomial-time constructible) reductions of the 
relevant reasoning tasks in the context of belief change scenarios into QBFs. More 
specifically, we deal with the following basic reasoning tasks: 

DEFEXT : Decide whether a given belief change scenario B has some consistent defini- 
tional extension. 

CHOICE : Given a belief change scenario B and some formula 0, decide whether (p is 
contained in at least one consistent definitional extension of B. 

SKEPTICAL : Given a belief change scenario B and some formula </>, decide whether (f> 
is contained in all consistent definitional extensions of B. 

For all of the above decision problems, we also treat the corresponding search problems. 

Given a belief change scenario B = {K, Hi, C/ 2 ), from now on we assume that 
K, Ui, and U 2 are finite; thus, these sets are also represented as the conjunction of its 
elements. 

Consider B as above, and let V be the set of variables occurring in K, U\, or C/ 2 . 
We define the following basic module: 

M[B] =K' A (14, < (H = V')) A C/i, 

where Veq = {veq I V G 1/} is a set of new variables. 

The decision problem defext, together with the corresponding search problem, can 
be expressed as follows: 

Theorem 1. Let B = (K, Ui, U 2 ) be a belief change scenario in C-p and V the atoms 
occurring in B. Consider the following QBF: 

Te^B] = 3V3V'{M[B] A C/ 2 ) A 

^ h3V3V'{{v = v') A M[B] A C/ 2 ))). 

Then, B has a consistent definitional extension ijf IF ext[B] is satisfiable. Moreover, the 
satisfying truth assignments of the free variables V^q of Fext[B] are in a one-to-one 
correspondence to the consistent definitional extensions of B. This correspondence is 
provided by the following two mappings: 

1. For a model M of Fext[B], the corresponding consistent definitional extension of 
B is given by Cn{K' U {n = n' | Veq G M} U Ui) fl Lp. 

2. For a consistent definitional extension Cn{K' U EQ U Ui) (T Cp of B, the set of 
atoms M with {v^q \ (v = v') G EQj is a model ofFext[B]. 
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Note that Veq constitutes the set of free variables of Text \B ] . Intuitively, Veq guesses 
a set EQ of equivalences underlying a definitional extension of B. The first conjunct of 
Text[B] checks consistency, and the second conjunct checks whether EQ is maximal 
with respect to set containment. 

For illustration, consider the belief change scenario B = {{p A q}, {-'p V “■(?}, 0) 
from Section 2.2, and the corresponding QBF Text [B ] . The free variables of Text [B] 
are given by {peq,Qeq}, so we get the following four interpretations serving as potential 
models of Text [B ] : 

Ml = {}; M3 = {geq}; 

M2 — {Peg}: AI4. — 

Since B has two consistent definitional extensions, generated by EQi = {p = p'} and 
EQ 2 = {q = q'} (cf. Table 1), we expect M 2 and M 3 to be models of Text[B]. Let us 
first look at the left conjunct 3V3V' {Ti[B] A C/ 2 ) of Text[B] in (2). For B as above, 
we obtain 

3 V 3 V' {M[B] AU2) = 

3pqp'q' {{p' A q') A {peq {p = p')) A (g^g hp V -■g)) . (3) 

This QBF has three models, viz. Mi, M 2 , and M 3 . Interpretation Mi is a model because 
both conjuncts {peq {p = p')) and (g^^ — >■ (g = g')) of (3) evaluate to true (given 
thatpeg 7 qeq 4- Ml), and since the remaining formula (p' A g') A {-•pV -•q) is consistent. 
For M 2 , we similarly get that {qeq — >■ (g = g')) is true and that {p' A g') A {peq — >■ (p = 
p'))A(-'pV-'g) is consistent, since {p', g',p} is a model of (p'Ag')A(p = p')A(-'pV-'g). 
M 3 is a model by analogous arguments. However, M 4 is not a model of (3). The reason 
for this fact is that, under M 4 , formula (3) can be reduced to 

(p' A g') A (p = p') A (g = g') A {^p V -ig), (4) 

which is not satisfiable. 

Hence, we are left with three possible models of Text[B], viz. Mi, M 2 , and M 3 . 
Now we investigate the remaining conjuncts of Text[B], which are given by 

^1 = -'Feg ~‘3pqp'q' (^{p = p') A (p' A g') A (peg (p = p')) A 

A(geg {q = g')) A (-'P V -ig)) 

and 

^2 = -'9eg ~'3pqp'q' ({q = q') A (p' A g') A (peg (p = p')) A 

A(geg (g = g')) a (-■p v -ig)) . 

First, consider interpretation M 2 . Since peg G M 2 , conjunct <?i evaluates to true, 
and it remains to analyse <p2- The latter formula evaluates to true if 

((9 = q') A (p' A g') A {peq (p = p')) A (geg (g = g')) A (-■p V -ig)) 



(5) 
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is not satisfiable. However, given M 2 , (5) reduces to (4), which is indeed unsatisfiable. 
Hence, M 2 is a model of ^2 and thus also a model of Text [B] . By a similar argumentation 
it follows that M3 is a model of <Pi A <p 2 - It remains to see that Mi is not a model of 
tPi A <?2- In fact, it holds that i^Mi (^1) = (^2) = 0. We show the case of <Pi (the 

case of <p 2 follows analogously). Since Mi = {}, <Pi is false under Mi iff 

3pqp'q' (^(p = p') A (p' A q') A {peq -A {p = p')) A {peq -A {q = q')) A (-.p V -^q)^ 

is true under Mi . Given that both peq and qeq are false under Mi , the previous condition 
holds iff 

((P = P') A (p' A q') A (-ip V (6) 

is satisfiable. Clearly, this is trivially the case, since {p,p^g^} is a satisfying truth 
assignment for (6). Thus, Mi is not a model of This concludes the proof that Mi is 
not a model of Text \B]. 

We can also directly characterise the models of consistent definitional extensions, 
by means of the following construction: 

Lemma 1. Let B = (K,Ui,U 2 ) be a belief change scenario in C-p, let V be the 
atoms occurring in B, and let Cn{K' U EQ U f7i) fl Cp be a consistent definitional 
extension. Define M*[B] as the result of substituting all occurrences ofveq in M[B] 
by T if {v' = v) G EQ and by _L otherwise. Then, the models of3V'M*[B] coincide 
with the models of Cn{K' \J EQ U f7i) fl Lp. 

Theorem 2. Let B be as in Lemma 1 and consider the QBE 

3VEQ(Text[B]A3V'M[B]). (7) 

Then, an interpretation M of the free variables V of (7) is a model of (7) iff M is a 
model of some definitional extension of B. 

Next, we discuss the translations of the reasoning tasks choice and skeptical. 

Theorem 3. Let B = {K, U\, U 2 ) be a belief change scenario in Lp, (j) a formula, and 
V the atoms occurring in B and 4>. Consider the following QBFs: 

T choice\B , (j)^ — T ext [B] AyV{3V'M[B] -A (j)); 

^skept[B,fi] = Text[B] A -n'fV{3V'M[B] -a </>). 



Then: 

1. (j) is contained in at least one definitional extension of B iff 3VeqTchoice[B,4)\ 
evaluates to true. Moreover, the satisfying truth assignments of the free variables Veq 
of Tchoice[B , (j)] are in a one-to-one correspondence to the definitional extensions 
of B containing (j). 

2. (j) is contained in all definitional extensions of B iff -<3Veq T skept[B , evalu- 

ates to true. Moreover, the satisfying truth assignments of the free variables Vgq of 
^skept [B, (f\ are in a one-to-one correspondence to the definitional extensions ofB 
not containing fi. 
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Theorems 1 and 3 provide encodings of reasoning tasks for arbitrary belief scenarios. 
In particular, they include the characterisation of the corresponding reasoning tasks 
associated with revision and contraction, as illustrated by the revision example discussed 
previously. For convenience, we list the tasks for revision: 

Rdefext : Given a knowledge base K and some formula a, decide whether a consistent 
definitional extension ofi?= {K,{a},%) exists. 

Rchoice : Given a knowledge base K and formulas a and (f), decide whether there is 
some choice revision K+^a containing <f>. 

Rskeptical : Given a knowledge base K and formulas a and </>, decide whether (f> is 
contained in the skeptical revision K+a. 

The corresponding tasks for belief contraction, denoted by Cdefext, Cchoice, and 
CsKEPTiCAL, are defined accordingly. 

From the reductions described above, we immediately obtain upper bounds for the 
computational complexity of the current belief change framework. This follows from 
the fact that our QBF encodings are polynomial and that the quantifier order of QBFs 
determines at which level of the polynomial hierarchy the associated evaluation problem 
lies. More specifically, the evaluation problem of QBFs having quantifier order 3V is 
complete for the class dually, the evaluation problem of QBFs having quantifier 

order V3 is complete for Thus, since completeness of a decision problem D for a 
complexity class C implies membership of D in C, inspecting the quantifier order 
of the above translations yields the upper complexity bounds for the corresponding 
decision problems. In fact, for all of the above versions of choice and skeptical, 
these bounds are strict, i.e., the completeness property is preserved for these tasks. 
This can be seen by providing polynomial mappings from the evaluation problem of 
QBFs having one quantifier alternation into the respective decision problems associated 
with belief change scenarios (these mappings represent in effect the inverse relations 
of the encodings described in Theorem 3). However, for defext and its variants we 
actually obtain lower complexity bounds than those for choice and skeptical (providing 
the polynomial hierarchy does not collapse), because deciding whether a given belief 
change scenario B = {K,Ui,U 2 ) has a consistent definitional extension is equivalent 
to checking whether Ff' U f7i U U 2 is consistent. 

Summarizing, we obtain the following complexity results, strengthening the com- 
plexity analysis discussed in [3]: 

Theorem 4. We have the following completeness results: 

1. DEFEXT, Rdefext, and Cdefext are NV -complete. 

2. CHOICE, Rchoice, and Cchoice are S^-complete. 

3. SKEPTICAL, Rskeptical, and Cskeptical are II 2 -complete. 

Let us remark that the NP-completeness of defext shows that this task can in prin- 
ciple be handled by an existentially quantified QBF, expressing a simple consistency 
check, as argued above. However, the circumstance that Text [B] is somewhat more in- 
volving (having quantifier order 3V) is due to the fact that this QBF has been constructed 
to deal also with the corresponding search problem of defext, which is actually more 
complex than a simple yes-no-answer because of the inherent maximality-checks. 
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4 Implementation 

Our methodology for expressing reasoning tasks associated with belief change scenar- 
ios in terms of quantified Boolean formulas is motivated by the availability of several 
practicably efficient QBF-solvers. Among the different tools, there is a propositional 
theorem-prover, boole, based on binary decision diagrams (the system can be down- 
loaded from the Web at http://www.cs.cmu.edu/~modelcheck/bdd.html), a system us- 
ing a generalized resolution principle [12], several provers implementing an extended 
Davis-Putnam procedure [2; 8; 16], as well as a distributed algorithm running on a 
PC-cluster [8]. With the exception of boole, these tools do not accept arbitrary QBFs, 
but require the input formula to be in prenex conjunctive normal form. To avoid an 
exponential increase of formula size, structure preserving normal form translations [4; 
14] can be used to translate a general QBF into the required normal form. In contrast to the 
usual normal form translation based on distributivity laws, structure preserving normal 
form translations introduce new labels for subformula occurrences and are polynomial 
in the length of the input formula. 

The translations discussed in the previous section have been implemented as a spe- 
cial module of the reasoning system QUIP [7; 6; 5], which is aprototype tool for solving 
several nonmonotonic reasoning tasks based on reductions to QBFs. Currently, QUIP 
handles tasks for logic -based abduction, default logic, several types of modal nonmono- 
tonic logics, and disjunctive logic programs under the stable model semantics. 

The general architecture of QUIP is depicted in Figure 1. QUIP consists of three 
parts, namely the filter program, a QBF-evaluator, and the interpreter int. The input 
filter translates the given problem description (in our case, a belief change scenario and 
a specified reasoning task) into the corresponding quantified Boolean formula, which 
is then sent to the QBF-evaluator. The current version of QUIP provides interfaces to 
most of the sequential QBF-solvers mentioned above. For the solvers requiring prenex 
normal form, the QBFs are translated into structure preserving normal form. The result of 
the QBF-evaluator is interpreted by int. Depending on the capabilities of the employed 
QBF-evaluator, int provides an explanation in terms of the underlying problem instance 
(e.g., listing all consistent definitional extensions of a given belief change scenario). This 
task relies on a protocol mapping of internal variables of the generated QBF into concepts 
of the problem description which is provided by filter. 

The system QUIP has been implemented in C using standard tools like LEX and 
YACC (comprising a total of 2000 lines of code, excluding the used QBF-solver); it runs 
currently in a Unix environment (Sun/Solaris and Linux), but is easily portable to other 
operating systems as well. 
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5 Conclusion 

We have shown how belief revision and belief contraction within belief change scenar- 
ios can be axiomatised by means of quantified Boolean formulas. This approach has 
several benefits: First, the given axiomatics provides us with further insight about how 
belief revision and contraction work within belief change scenarios. Second, this ax- 
iomatisation allows us to furnish upper bounds for precise complexity results, going 
beyond those presented in [3]. Last but not least we obtain a straightforward implemen- 
tation technique of belief change in belief change scenarios by appeal to the existing 
nonmonotonic reasoning framework QUIP [7; 6]. 
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Abstract. In daily life we have two kinds of knowledge at our disposal, 
pieces of information ruling out what is known to be impossible on the one 
hand, and case reports pointing out things which are indeed possible. The 
fusion of the first type of information is basically conjunctive, while it is 
disjunctive in the other case. The second type of information has been 
largely neglected by the logical tradition. Both types can be pervaded with 
uncertainty. The paper first describes how the two types of information can 
be accommodated in the possibility theory and in the evidence theory 
frameworks. Then it is shown how the existence of the two types of 
information can shed new light on the revision of a knowledge base when 
receiving new information. 



1 Introduction 

In the logical tradition, a piece of information delimits a set of possible worlds, and 
any interpretation which is not a model for the set of available pieces of information 
is regarded as impossible. The conjunctive combination of pieces of information 
represented in this framework amounts to performing the intersection of the sets of 
possible worlds representing these granules of information. Thus the more 
information we have, the more restricted is the corresponding set of possible worlds. 
This set may even become empty in case of inconsistent information. 

But all the knowledge we have is not of this kind. We also accumulate 
information from observations or reports about particular cases, examples and so on. 
This information, which rather assesses the feasibility of possible worlds (in the sense 
that they can be done), is combined disjunctively (without ever leading to any 
inconsistency). Indeed the more information of this kind we have, the larger the set of 
possible worlds which is granted. Interestingly enough, the semantic entailment goes 
here in a way which is the opposite of the situation in the classical logical 
representation framework. Namely, if all the worlds in A are feasible, we can conclude 
from this piece of information that all the worlds in B are feasible only if the set 
inclusion B C A holds, and nothing is said about the worlds outside A. 

In fact, we may often have the two kinds of information at our disposal 
simultaneously. For instance, if we are interested in the price of a particular second 
hand car, we may know on the one hand that the laws of the market force the price to 
be in some range, while on the other hand the observation of similar cases may let us 
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think that such and such prices are indeed feasible. The coexistence of the two kinds of 
information telling things which are not impossible, from things which are for sure 
possible, and their joint representation have been already recently discussed in various 
settings including modal logic by Dubois, Hajek, and Prade [3], It has been also 
considered in the restricted setting of fuzzy rules by Ughetto et al„ [17]. 

Clearly, these two kinds of information can be pervaded with uncertainty. 
This is investigated in the frameworks of possibility theory and evidence theory in 
this paper. As we suggest in the second half of this paper, it contributes to shed new 
light on some belief revision problems, as first hinted in [10]. 



2 Incomplete information about classes and about objects 

As already said, pieces of information about the value of an attribute for an object can 
be of two different types. In its simplest form this gives birth to a knowledge 
representation framework made of a pair of ordinary subsets (GP, NI) where GP stands 
for 'Guaranteed Possible', and Nl for 'Not Impossible'; see [3]. When information is 
consistent, we should have GP C NI, i.e. what is claimed to be feasible cannot be 
ruled out as being impossible. This framework enables us to capture incomplete 
information under two different forms, one pertaining to the description of classes of 
objects, and the other concerning the description of the objects themselves. 

The range of existing values E for an attribute used in the description of a 
class of objects may be incompletely known. In other words, we may only know a 
lower bound GP, and an upper bound NI of the set E of attribute values existing in 
the class, i.e. GP C E C NI. When information about the class is complete for the 
considered attribute, we have GP = E = NI, i.e. for any value in the attribute domain, 
it is known if there exists or not, at least one object in the class whose attribute value 
is this one, in other words if the value is feasible or not. Thus, given the attribute 
domain U, incomplete information about the set of attribute values existing in the 
class can be modelled by the twofold set (GP, NI) with GP C NI C U. GP gathers 
attribute values which are known as possible for sure and which are said to have a 
guaranteed possibility (e.g., because such values have been encountered, or are known 
as feasible) on the one hand, while the set U - NI indicates what attribute values are 
known as impossible for the objects in the class (e.g., as the result of generic laws, 
principles,...) on the other hand. NI is thus the set of attribute values which are not 
impossible for the objects in the class, but which are not necessarily guaranteed as 
possible. For instance, we consider a class of cars and the attribute in which we are 
interested is the price. Then we may know that the price of all the cars in the class 
should be within some range NI, while some prices gathered in GP C NI are surely 
possible ; the prices in NI - GP might be possible, since they are not forbidden 
although they have never been reported. Extreme situations are retrieved when either 
GP = 0 (the situation captured by classical logic), or when NI = U (we only have a 
repertory of reported cases). GP = 0 only indicates a total lack of observation-based 
information, while Nl = 0 means inconsistency (at least one value should be 
possible)! Thus NI 0 is assumed. 

Now if we are considering an object (a car in our example) which is Just 
known to belong to the class (described as said above), we may wonder about the 
possible values of some attribute (here the price) for this object. The attribute is here 
supposed to be single-valued. Note that complete information at the object level not 
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only means GP = Nl but also that this set is a singleton. Then, in the general case, 
NI and GP play the role of {0,l}-valued distributions; in the above example, if u^NI 
then u cannot be the price of the car; if uGGP, u can indeed be the price of the car, 
while if uG(NI - GP) nothing forbids u from being the price of the car, but u does 
not belong to the kernel GP of values which are possible for sure (u may just be the 
price of the car). 

More generally, given the statement s = 'the value of the attribute for the 
object is in S' and the complete information E about the class, the four standard modal 
situations (s is certainly true if E C S; s is possibly true if E H S 0; s is possibly 
false if E n (U - S) 0; s is certainly false if E H S = 0) are now refined in a larger 
set of situations (S 0, GP 0 and NI U are assumed): 

- s is certainly true (ct) if NI C S; 

- s is accepted as true (at) if GP C S; 

- s may be true (mat) if GP fl S i>! 0; 

- s might be true (mit) if NI (T S 0; 

- s is guaranteed to be possible (gp) if GP □ S (all values in S have been reported); 

- s might be false (mif) if NI D (U - S) 0; 

- s may be false (maf) if GP fl (U - S) 0 (s is guaranteed to be not certain); 

- s is dubious (d) if GP fl S = 0; 

- s is certainly false (cf) if NI (T S = 0. 

Note that ct and mif are mutually exclusive, as well as mit and cf, while ct 
entails at, which entails mat (if GP ic 0), which entails mif, similarly c/ entails d, 
which entails maf, which entails mif; lastly, gp entails mat. 



3 The possibility theory setting 

The information about the possible range of attribute values in a class can be pervaded 
with uncertainty. Different uncertainty frameworks which are compatible with the set- 
based representation of incomplete information can be used for refining the above 
view where laws and observations can both provide uncertain information. This is 
especially the case for belief function-based evidence theory and for possibility theory. 
Let us first consider this latter framework which is simpler. 

3.1 Background 

In possibility theory [2], the basic building element is the notion of a possibility 
distribution it which is defined from the considered attribute domain to a linearly 
ordered scale, finite or not (e.g., [0, 1]). Associated to k are a possibility measure and 
a necessity measure respectively denoted by 11 and N, and defined by 

n(A) = supy0^ jt(u) if A 0, and 11(0) = 0; 

N(A) = 1 - I1(A‘^) = infy^A ( 1 - Jt(u)) if A p* U, and N(U) = 1 ; 
where A^^ denotes the complement of A (uGA*^ u^A). As it can be seen, given the 
partition (A, A^^) of the attribute domain, the information represented by n is 
summarized by two numbers which are directly related to the maximum of k over A 
and over A^^. An event A is all the more possible as there is a model u of A which is 
highly possible; A is all the more necessarily true, or certain as there is no counter- 
model of A which is highly possible. Contradictions are impossible (11(0) = 0) and 
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tautologies are certain (N(U) = 1). Consistency requires to have n normalized, i.e. 3 u 
E U, Jt(u) = 1, in order to have I1(U) = 1, N(0) = 0 and more generally VA, N(A) s 
n(A) where N(A) = 0 or 11(A) = 1. This expresses that an event A should be fully 
possible (11(A) = 1) before being somewhat certain (N(A) > 0). 

The possibility distribution k is not always directly available, but may be 
only implicitly defined through constraints of the form N(Aj) a Uj for some subsets 
A; with i = l,k, expressing that some events A, are somewhat certain, as it is the case 
in possibilistic logic [4]. Then applying the minimal specificity principle, we can 
compute the less restrictive possibility distribution jr* which agrees with the 
constraints, ji* is given by 

n:*(u) = minj_| max(Aj(u), 1 - Oj) (1) 

where Aj(u) = 1 if u E A; and Aj(u) = 0 if u ^ Aj. jr* is the greatest distribution such 
that the constraints are satisfied (it can be checked that N(Aj) a a; where N is 
computed from jt*). 

Besides, a qualitative summarization of the information conveyed by w.r.t. 
a partition (A, A^^) should also involve the minimal values of Jt over A, and over A^^. 
This is why two new set functions are worth introducing [7], Namely 
A(A) = infu£A i = 1 - A(A^). 

A is called guaranteed possibility function. V(A) estimates the potential certainty of 
A. Starting with a set of constraints of the form A(Bj) a Pj with j = l,n, expressing 
that (all) the values in Bj are guaranteed to be possible at least at level Pj and applying 
a principle of maximal specificity, yields the smallest possibility distribution such 

that the constraints are satisfied. Note that this principle is the converse of the one 
used in (1), and is in the spirit of a closed-world assumption: only what is said to be 
(somewhat) guaranteed possible is considered as so. Namely 

jt,^(u) = maxj_i nmin(Bj(u), Pj). (2) 

3.2 Possibilistic representation of the two types of information 

Assume that we have at our disposal two collections of pieces of knowledge, {N(Aj) 
a a;, i = l,k} and {A(Bj) a pj, j = l,n}, expressing respectively facts about which 
we are somewhat certain (the true state of the world cannot be outside the subsets Aj 
up to exceptional cases), and sets of values that are known as being feasible candidates 
for the true state of the world. Then we can derive the two distributions it* and jr,|, 

defined in 3. 1 . These two distributions should be such that 

Vu, Jt*(u) s jt*(u) (3) 

in order to guarantee the mutual consistency of the two sets of pieces of information, 
namely, which is known as being somewhat feasible should not be ruled out as being 
somewhat impossible. Thus, the pair (3t„,, jt*) can be viewed as approximating an 

unknown possibility distribution jt from above and from below, which is not 
completely defined from the available pieces of information and which would reflect 
the fuzzy range of existing attribute values for a class of objects, namely we have 
Jt„, S 31 S 3t*. (4) 

The normalization of n entails that Jt* should be also normalized; ji* may 
not be normalized. Clearly, this generalizes the situation described in Section 2, where 
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jt:(u) = E(u), jr*(u) = GP(u) and Jt*(u) = NI(u) with GP C NI, and where NI ic 0 
while we may have GP = 0. Thus the possibilistic modelling provides a graded view 
of the feasibility and of the impossibility. Complete ignoranee, i.e. absence of any 
information of any kind is modelled by Vu, jr*(u) = 0 i.e. GP = 0 (no observation 

reported) and by Vu, Ji:*(u) = 1, i.e. NI = U (no value is (even somewhat) impossible). 
The upper distribution is the basis for computing beliefs, i.e. what is held for 
somewhat certain since N(A) = inf„^/^ (I - Jt(u)) a inf^^/^ (1 - Jt*(u)) due to (4). 
Similarly, what is held as possible for sure can be estimated from A(A) = infy^^ 
jt(u) a infuEA to (4). For instance, if Vu, 3t*(u) = 1, we have no (non- 

trivial) beliefs, although we may know that some values are indeed feasible. 

Remarks 

1. GP(u]) = GP(u 2 ) = 1 means that both uj and U 2 are feasible for the objects in the 
class. In case we would only know that uj or U2 is feasible, it would lead to work 
with disjunctions of distributions, namely 3t„, s jr or s Jt, where n:*(u|) = I and 

jr*(u 2 ) = 0, Jt*'(ui) = 0 and 3 t:*'(u2) = 1- 

2. Associated with it* and jt*, we can build the possibility distribution a from 2^ to 

[0,1], which estimates to what extent it is possible that a subset W be the exact range 
E of attribute values for the objects in the class. Namely, 

ji(W) = sup{min(a, 1 - P), GPp C W C Nla} (5) 

where GPa = {u: Jt„,(u) > a} and Nla = {u: 3t*(u) a a}. 

Note that is % normalized as soon as GPq = {u: jt*(u) > 0} C NIi = {u: jr*(u) = 1}, 

the support of GP (gathering the values which are somewhat feasible) is included in 
the core of NI (made of the values which are totally possible). 

4 The evidence theory setting 

The presentation follows what was done for the possibility theory setting in the 
previous section, and might provide a solution within belief function theory [12]. Let 
m be a basic belief assignment from 2^ to [0,1], and the associated focal elements Fj 
such that 2i m(Fj) = 1, with possibly m(0) 0. Then three set functions are 

classically associated with m, namely the belief, plausibility and commonality 
functions 

bel(A) = 2i; Fj C A and Fj 0 m(Fj); 

pl(A) = 1 - m(0) - bel(A‘^) = 2j; Pj n A 0 mfFj); 

q(A) = 2i: FjDAmfFi). 

They generalize necessity, possibility and guaranteed possibility functions in 
the general case where the focal elements are not nested. Moreover m(0) = 0 is 
equivalent to the normalization of jt in the nested case. 

Given a collection of constraints {bel(Aj) a a;, i = l,k}, choosing a 
particular solution among the belief functions satisfying these constraints is not an 
obvious matter, since the selected solution depends on the criterion which is used, or 
for some criterion which extends the idea of minimal specificity the solution may not 
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be unique. However, provided that S, a; < 1, one noticeable solution maximizing the 
cardinality of the focal elements is to take m*(Aj) = a;, and m*(U) = 1 - Sj ap If 

aj> I, masses should be allocated to the intersection of sub-families of Aj's. See 
Dubois and Prade [6] for further discussions. 

Similarly, starting with a collection of constraints of the form {q(Bj) a Pj, j = l,n}, if 
2j Pj < 1, one obvious solution is m,^(Bj) = Pj, and m„,(0) = 1 - Pj obeying a 
minimal cardinality principle. 

Thus, modelling the two types of information, in evidence theory amounts to 
using two basic belief assignments m# and m* (induced or not by such constraints). 

This is equivalent to have a collection of focal elements Fj together with pairs of 
weights (m*(Fj), m*(Fj)), where one of the elements of a pair may be equal to 0. 

Mutual consistency between the two types of information also requires that 
m,^ be "smaller" than m*. Again several non-equivalent definitions are possible, e.g;, 

bel*(A) s bel*(A) for any A, or the specialization (or random set inclusion) which 
generalizes Vu, Jt,^(u) s Jt*(u); see Dubois and Prade [5] for detailed presentations. 

5 Fusion of multiple-source information 

Let us come back to the possibility theory setting first. As already explained, the 
distribution represents the information reporting feasible values, is the 

characteristic function of the set of observed interpretations (with their level of 
feasibility). Jt*(u) = 1 means that u is totally guaranteed to be possible, while 3t,|,(u) = 

0 does not mean that u is impossible, but only that u has not been reported as 
observed (for the attribute value of an object in the class). Observations accumulate, 
the more observations we have, the larger This is why the elementary possibility 




pieces of information "the values in Bj are guaranteed to be feasible at level Pj", for j 
= l,n are aggregated disjunctively by max operation in (2). 

This is the converse with the upper distribution Jt*. Indeed, jr*(u) = 0 means 
that u is impossible for sure, and Jt*(u) = 1 means that u is not at all ruled out, not at 
all impossible. The more information we have, the smaller ir*. Indeed the pieces of 
information "the attribute value is in Aj is certain at level aj" (i.e. N(Aj) a a;) 
represented by the elementary distribution 3i*i(u) = max(Aj(u), 1 - a;) are aggregated 
conjunctively by min operator in (1). 

Thus given two epistemic states provided by two different sources of 
information, represented by the pair (jt,^ ] , Jt* j), and a (Jt,^2’ ^*2) respectively, the 

fusion process yields (max(jt,i, I , rnin(jt*], Jt* 2 )). As already explained, two 

consistency conditions should be preserved. 

- First, min(jt* j, 31*2) should remain normalized. 

- Second the mutual consistency condition (3) should still hold, namely 

max(3t*j, Jt^,2) s min(3x*j, 3T*2). 
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If one of these conditions is not satisfied, a revision process (discussed in the 
next section) should take place. 

In the evidence theory setting, the situation is similar. Given two pairs 
(m,^l, m*|) and (m„, 2 , m* 2 ) provided by two sources of information, m*| and m *2 

are to be combined by the (conjunctive) Dempster rule of combination, namely (in the 
non-normalized case) 

m*(C) = 2ij; FjnGj=C m*|(Fi)m* 2 (Gj)), 

while m,^] and m „,2 are to be combined disjunctively (Dubois and Prade [5], Smets 
[14]), namely 

m*(C) = Sij; FjUGj=C m*i(Fj)m* 2 (Gj)). 

Again mutual consistency condition has to be preserved here (see Section 4). 

6 Revision 

The upper distribution part of the representation, (ji*, or m*) is the basis for 
computing beliefs, in terms of N or bel functions. Interestingly enough, this 
framework enables us to express not only beliefs but also reasons for not being 
certain about something when the lower distribution part of the representation, (jc„,, or 

m,^) is not consistent with the upper distribution part. 

Indeed first consider the two-valued case. Then, tz* restricts possible locations 
of the true state of the world according to the information we have. The two-valued 
necessity measure N, then defined by N(A) = 1 if {u: Jt*(u) = 1} C A, and N(A) = 0 
otherwise, enables us to describe our beliefs, i.e., the events A such that N(A) = 1. 

The reasons not to believe A come from observations, reports, things we 
have experienced, that we know for sure as being possible. The distribution can 

account for the reasons not to believe A. Indeed let us suppose that Bup 3t,i,(uo) = 1 
and UQ ^ A. Then there is a conflict between statement A which rules out uq and the 
fact that UQ has been observed. If the mutual consistency condition holds, namely {u: 
jt,^(u) = 1} C {u: Jt*(u) = 1}, nothing in can provide reasons not to believe 
statements supported by it*. Thus the set {u: jr,^(u) = 1 and jr*(u) = 0}, when it is 

not empty, provides reasons not to believe any statement A such that the inclusion A 
□ {u: Jt,^(u) = 1 and it*(u) = 0} fails. 

This provides a basis for a natural way of revising Jt* by in case of 
conflict. Namely, 

Ji*revised(u) = max(jt*(u), Ji*(u))- (6) 

This is a contraction of the belief set Gardenfors [11]. It ensures, in a 
minimal way, the mutual consistency condition Vu, Jt*(u) s Jr*revised(u). Note also 

that Jt*revised >s still normalized if n* is. This can be extended to general possibility 
measures to which formula (6) can be applied. Then we have reasons not to believe a 
statement A which is such that N(A) a a on the basis of jt*, if 3 u such as u ^ A 
and Jt,^(u) > 1 - a, since we are then violating the consistency requirement between 
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the observations and the belief set, expressed by the inclusion Vu Jt*(u) s jr*(u). This 

can be illustrated by the Ukalvia story Smets [16], 

Here is the Ukalvia case. You are said that a newspaper reports that the 
economic situation was good last year in Ukalvia. If it’s all you know about Ukalvia, 
you start to believe the information, let’s denote it ‘G’, to some extent, i.e. N(G) > 0 
or bel(G) > 0. But later you learn that the information is originated from the 
newspaper of the unique authorized party in Ukalvia. So you start to think that may 
be it is propaganda. So you would like to come back to a state close to total 
ignorance (as far as beliefs are concerned) about the economic situation in Ukalvia. 
Clearly, the idea is that the reasons not to believe A can somewhat inhibit the reasons 
to believe A, if any. There are two possibilities; either the situation is good (G) or it 
is not good (--G). Let U be the frame of discernment. U = {G, -■G}. So bel(G) > 0 
can be represented by the mass function m(G) = a; m(U) = 1 - a which is a simple 
support function, still equivalent to a possibility distribution, i.e. ji*(G) = 1, jt*(->G) 
= 1 - a < 1. 

The state of total ignorance is represented by m*°(U) = 1, or if we prefer by 
ji*°(G) = 3t*°(->G) = 1. In evidence theory, we are apparently looking for a mass 
function m’ such that m © m’ = m°, where © is the Dempster rule of combination. 

But this equation has no solution. Note that m”'(-'G) = a ; m’’(U) = 1 - a, which 

corresponds to the opposite belief bePf-'G) = a > 0 is not a solution since m © m”" 
has G, -'G, and U as focal elements. We are back to reasons to believe --G, rather 
than to representing reasons not to believe G. In possibility theory, the equation Jt° = 
min(jr*, jc’) = 1, has no solution either (if jr* 1). 

Since we are using simple support functions in this example, let us consider 
what the (jt,^, Jt*) - representation framework means here. In Ukalvia case, the fact 

that it is known from past experience that such article may be wrong can be 
represented by 3r„,(G) = 0 and Jt„,(->G) = 1 - p s 1, i.e. there is strong evidence that 

papers published in this newspaper often does not say the truth. Then applying (6), 
i.e. Jtrevised(u) = max(3t*(u), jr*(u)) we get %evised(G) = 1 and %evised("'G) = I - P 
which is close to ignorance. 

Smets [16] has already proposed to deal with a confidence component and 
with a diffidence component separately, for representing belief states. It gives birth to 
a latent belief structure made of a pair of belief functions, one for the confidence part 
and the other for the diffidence part, which cannot be summarized into a unique 
structure of ‘apparent’ beliefs, without loss. Then apparent beliefs are obtained by 
subtracting the diffidence component from the confidence component, using the 
operation inverse of © for non-dogmatic belief functions. This may however lead to 
belief structures with negative masses which are difficult to interpret. The important 
point (made in this paper) is that the two information components are not of the same 
nature and should be handled separately. Besides, a direct counterpart of (6) can be 
easily obtained in the belief function framework using the above rule of disjunctive 
combination. 

It is important to notice that the revision process modelled by 



' Non-dogmatic belief functions are such that P1(A) = 0 <ii> A = 0 so that nothing is 
implausible, and nothing, except tautologies, is fully certain : Bel(A) = 1 A = X 
(since P1(A) = 1 - Bel(-'A)). 
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Ji*revised(u) = max(3T*(u), 3t*(u)) 

describes the revision of n* by jr* once a new report has been fused with the current 
It* using max operation. Thus priority is given to reports on observed values and it is 
a fte/ie/ revision process. A dual type of revision which could be called "observation" 
revision would consist in changing 3t* into 3t„,revised when receiving information 

restricting n* more (applying fusion based on min combination). Observation 
revision is defined by 

3t*revised(u) = min(jt*(u), 3i*(u)). (7) 

Then beliefs are given priority w.r.t. observations, i.e, we keep our beliefs, 
and we forget some observations, which is unusual except perhaps for ostreiches! 

Clearly there is another revision process in this framework, which deals with 
the upper distribution separately. It is well known and has been extensively considered 
in the literature. It is the revision of Jt* when min(jt*, is no longer 

normalized, where jt*"®"' is the possibility distribution encoding the new belief 
(possibly uncertain) which is received (i.e. if we learn N(A) a a we have Jt*"®"'(u) = 
max(A(u), 1 - a) ); see Dubois and Prade [9]. Note that the arrival of new 
information pertaining to jr,,. only leads to an expansion of jt* (via max combination) 

since observations just accumulate. The counterpart of (7) in the belief function 
framework is Dempster rule of conjunctive combination. 

7 Concluding remarks 

This paper has emphasized the interest of a twofold representation distinguishing 
between what is not impossible because not ruled out by our beliefs and what is 
known as feasible because it has been observed. In other words, it offers a framework 
for reasoning with rules and cases (or examples) in a Joint manner. This representation 
framework has been discussed in a detailed way and its consequences on fusion and 
revision of knowledge have been outlined. Interestingly, one may consider subjective 
knowledge as the source of the description of “ not impossible ” states, while the 
“ guaranteed possible ” states stem from objective data. Our framework could thus 
contribute to the debate between objective and subjective probabilities. 

There exist other situations where knowledge/ information of the two kinds 
takes place. Thus, in diagnostic problems, we encounter pieces of information of the 
kind "if you have a flu, then your temperature is between 38.5 and 40 °C; this means 
that any temperature inside this range is feasible, and can be explained by a flu for 
sure (while a temperature which is outside this range is impossible for a flu). Then 
assume a (very) imprecise observation, say 'temperature between 38 and 41', could not 
be explained here for sure by a flu, as pointed out by Besnard and Cordier [1] although 
knowing that the temperature is in the interval |38.5, 40|, entails in the classical 
sense that it is also in the interval [38, 41 1. Another informal example illustrating the 
distinction between the different types of information can be encountered when 
describing scenarii. Namely, in a scenario, we may have the following situations: 1) 
an event j necessarily follows another event i ; 2) an event j can for sure ybllovv event 
i ; 3) an event j may follow event i (i.e. nothing forbids it). The difference between 
situation 2 and situation 3, is that in situation 2, the observation of j after i is a clue 
for this type of scenario (in the sense that j belongs to a set of feasible followers of /), 
while it is not the case in situation 3. The difference between situations 1 and 2 is 
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that in the latter, y is among the events which are known as candidates for following /, 
while in situation 1 event j is not just a candidate, it should take place. The 
application of these ideas in diagnosis is an open issue. 
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Abstract. A (general) preferential entailment is defined by a “preference relation” 
among “states”. States can be either interpretations or sets of interpretations, or 
“copies” of interpretations or of sets of interpretations, although it is known that 
the second kind and the fourth one produce the same notion. Circumscription is a 
special case of the simplest kind, where the states are interpretations. It is already 
known that a large class of preferential entailments where the states are copies of 
interpretations, namely the “cumulative” ones, can be expressed as circumscrip- 
tions in a greater vocabulary. We extend this result to the most general kind of 
general preferential entailment, the additional property requested here is “loop”, 
a strong kind of “cumulativity”. The greater vocabulary needed here is large, hut 
only a very simple and small set of formulas in this large vocabulary is necessary, 
which should make the method practically useful. 



1 Introduction 

Preferential entailments are useful in knowledge representation. Four kinds are intro- 
duced in Kraus and al. [7], which in fact reduce to three. Till now, no system computing 
efficiently the most general kinds is known, but systems do compute circumscription, a 
particular case of the simplest kind of preferential entailment. Costello [4] has shown 
how, contrarily to an affirmation in [7], an important subclass of an intermediate kind 
can be translated into circumscription, by extending the vocabulary. We show that an 
important subclass of the most general kind can also be translated into circumscription 
by modifying the vocabulary. We begin with notations (§2), definitions (§3) and useful 
known results (§4). Then, we need two technical definitions: an auxiliary vocabulary 
in which the theories of the original language correspond to single interpretations in 
the new one (§5); and a simplified preference relation for a large class of preferential 
enfailmenfs (§6). Finally, we describe the translation (§7) and detail an example (§8). 

2 Notations and Framework 

• We work in a propositional language L. As usual, L also denotes the set of all the 
formulas. V(L), the vocabulary of L, denotes a set of propositional symbols. Letters 
ip, Ip denote formulas in L. A formula will generally be assimilated to its equivalence 
class. Letters such as T or C denote sets of formulas (i.e. subsets of L). Two logical 
constants T and _L denote respectively the true and the false formulas. 

• Letters p, v denote interpretations for L, identified with subsets of V (L). p\= p and 
p\=T are defined classically. If Mi C M, Mi |= T means /i |= T for any p G Mi. For 
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a set E, V{E) denotes the set of the subsets of E. The se,iV{V (L)) of the interpretations 
for L is denoted by M. A model of T is an interpretation /i such that /i |= T, M(T) and 
M((/?) denote respectively the sets of the models of T and (f. 

•'T' \= ‘P’ T \= T 1 and Th{'T) are defined classically. A theory is a subset of L closed 
for deduction, T denotes the set {T ^ L / T = Th{'T)} of the theories of L. If Ti is 
a theory, we get T C Ti iff Ti ^T, for any T C L. 

• A theory C € T is complete if V(/? G L, G C iff -up ^ C. We denote by C the 
set of all the complete theories of L. Th{p) denotes the set of the formulas satisfied 
by p. For any subset Mi of M, T/i(Mi) = {ip / Mi \= p} = n,ieMi Th{p). This 
ambiguous use of Th and of \= (for formulas or interpretations) is usual. For any T G T, 
T = flcec c\=T^' defines a one-to-one mapping between M and C: Th{p) G C 
for any /x G M. If 1/ (L) is hnite, 0 denotes the canonical one-to-one mapping from 
P(M) to L: for any Mi C M, 6>(Mi) is the formula such that M(6>(Mi)) = Mi. 

• T, C, M, Th, 0 and |= should be indexed by L. To keep the notations readable, we 

will denote two languages by say L and \J , and all what concerns L will be denoted as 
above, while we will use T', M', Th' ,0' and \=' for what concerns \J . 

3 The Various Kinds of Preferential Entailments 

Definition 3.1. Apre-drcMmicr/phon/ (in L) is an extensive (i.e.,/(T) 3 TforanyT) 
mappingfromTtoT. For any subset T ofL, we use the abbreviation /(T) = f{Th{T)), 
assimilating a pre-circumscription to a particular extensive mapping from P(L) to itself^ . 
We write f{p) for f{{p}) = f{Th{p)). □ 

Definitions 3.2 1 . A set of states S is a set of “copies” of elements of T (or equivalently 

[3] a set of “copies” of subsets of M); there exists a mapping I from S to T and, for 
any T G T, the subset 1~^{T) of S is the set of the copies ofT. 

2. As usual, we define ((S) = {((s)}sgs = {T G T / 1~^{T) ^ 0}. For any T C L, 
S(T) is the subset of S defined by S(T) = {s G S / ((s) ^ T}. 

3. For any T C L we define the subset of T; W(T) = {T i GT j T QTi}. We write 
W(v3) for W({(p}). Notice that we get S(T) = /“^(W(T)). 

Definitions 3.3 1. A general preference relation -<g is a binary relation over S. For 

any T G T, we define the subsets S^^(T) of S and W^^(T) of T as follows: 
S^^(T) = {s G S(T) / Si Ag sfornosi G S(T)}, and w!<^ (T) = ((S^,(T)). 

2. The general preferential entailment is the pre-circumscription defined by 
A,(T) = nr,6W,,(T) for any T C L. 

This is the definition of [3, Definitions 3.1, 3.2], originating from [7, Definition 3.1 1]. 
Particular cases give the most classical kinds of preferential entailments: 

Definitions 3.4 1. If Z(S) C C (instead of ((S) C T), let us call the general preference 

relation a multi preference relation, which we will denote by -<m instead of -<g and 
let us call a multi preferential entailment. 

* For a reader familiar with [7], a pre-circumscription is an inference operation satisfying the 
full (or theory) versions of reflexivity, left logical equivalence, right weakening wad AND. 
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2. If S = T and I = identity, let us call -<g a simplified general preference relation. 

3. If S = C and I = identity (i.e. restrictions 1 and 2 apply), then the relation, defined 
in C, is called a preference relation -< and is called a preferential entailment. 
As we work in propositional logic, C can be replaced by M and T by P(M) (see 

e.g. [3]). Point 1 originates from [7, Definition 5.6] and point 3 from [18]. The notion of 
general preferential entailment has been qualified as “cumbersome” in the introducing 
paper [7]. Then, this notion has been tamed in various texts [1,2,3,6,13,10,14]. 

The best known kind of preferential entailment is circumscription: 

Definition 3.5. P, Q, Z is a partition of F(L) . The symbols in P, Z and Q are respectively 
circumscribed, varying and fixed. We define the preference relation A(p, q, z) in M by: 
p, ~<(p,Q,z) if P n ^ C P n and Qn^ = QlTi^(C: strict inclusion). 

The circumscription CIRC(P, Q, Z) is the preferential entailment ^ ■ 

Definition 3.6. <P C L,1^(L) = QUZ (disjoint union), P^ = {P^},^g^isasetof distinct 
propositional symbols not in L. The formula circumscription of the set of formulas d>, 
with Q fixed and Z varying, is defined as follows, for any T C L: 

CIRCF{<P, Q, Z)(T) = CIRC{V, Q, Z)(T U {p ^ ) n L. 

CIRC is defined in the greater language \J\ V (L') = V (L) U P^ 

Remark 3.1. CIRCF{<P, Q, Z) is the preferential entailment in L associated with 
the preference relation A(g>;Q^z) defined in M by: 

P ^(<f;Q.z) t' if Th{p) n C Th{i>) n F and Qn/r = Qnj^. □ 

These are the usual propositional adaptations [17,12,4] of the original predicate cal- 
culus versions [8,9,16]. Circumscription is a preferential entailment (Definition 3.4-3) 
and various systems make useful automatic computation for propositional circumscrip- 
tion^. Thus, it is interesting to express more complex formalisms in terms of circum- 
scription. This has already been done for multi preferential entailments [4] (see also [13, 
1 1]), what we do now is to extend this technique to general preferential entailments. 

4 A Reminder: Characterization Results 

Here are known results from [7,17] and other texts (see [13,14] for precise references). 
We consider now that V (L) is finite. 

(Notice that in this case we can restrict our attention to finite sets S [7].) 

Definition 4.1. A general preference relation -<g is safely founded (sf), if for any s G 
S(T) — (T), there exists si G (T) such that si ^g s. 

Definitions 4.2 Here are various properties a pre-circumscription may possess. Ti , T 2 
are in T (remind that intersecting theories corresponds to a disjunction V of formulas): 

Case reasoning: /(Ti IT T 2 ) h /("^i) Z("^ 2 )- (CR) 

Disjunctive coherence: ./(Ti) U /(T 2 ) H ,/(Ti T T 2 ). (DC) 

^ Here are three examples: LWB (http://lwbwww.unibe.ch:8080/LWBtheory.html), SMODELS 
(http://www.tcs.hut.fi/Software/smodels/), and DLV (http://www.dbai.tuwien.ac.at/proj/dlv/). 
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Cumulative transitivity: /(TU T") C /(7^). (CT) 

Cumulative monotony: C f{T), f{T) Q f{T CT"). (CM) 

Cumulativity: If T" C f{T) , then f{T) = /(T U T") . (CUMU) 

If T 2 C /(Ti),- • • ,r„ c /(r„-i),Ti c f{Tn), then /(Ti) = /(r„). (L00P„) 
(Loop): For any integer n>2,f satisfies (LOOP„). (LOOP) 

Preservation of consistency: If /(Ti) = Th{l) = L, then Ti = L. (PC) 



Proposition 4.1. For pre-circumscriptions: 1. (CR) implies (CT). 

2. As ( CUMU) is ( CM) + ( CT), in case of ( CR), ( CUMU) and ( CM) are equivalent. 

3. (LOOP2) is equivalent to ( CUMU), (LOOPn+i) is stronger than (LOOPn). 

4. (CR) and (CUMU) imply (LOOP). □ 

Theorem 4.1. 1. For any general preferential entailment f^^, there exists a simplified 

general preference relation ^sg such that 

2. A pre-circumscription f satisfies ( CT) iff it is a general preferential entailment. 

3. A pre-circumscription f satisfies (CUMU) - respectively (LOOP) - iff it is a gen- 
eral preferential entailment defined by a relation -<g satisfying (sf) - respectively a 
transitive and irreflexive relation (i.e. a strict order) -<g satisfying (sf) (cf point 5). 

4. A pre-circumscription satisfies ( CR) iff it is a multi preferential entailment. 

5. A pre-circumscription satisfies (CR) and (CUMU) iff it is a multi preferential en- 
tailment defined in a finite set S by a relation which is a strict order ( on a finite 
set this implies (sf) and, contrarily to 3 for (LOOP), (sf) alone suffices here). 

6. A pre-circumscription satisfies ( CR) and (DC) iff it is a preferential entailment. 

7. A preferential entailment satisfies ( CUMU) and (PC) iff it is defined by a preference 
relation -< which is transitive and irreflexive, iff it is a formula circumscription. □ 



5 Modifying the Vocabulary 

Definitions 5.1 L and \f are two languages, / is a mapping from T to T and /' is 
a pre-circumscription defined in \J . We say that / is obtained from /' by (Def^) - 
respectively by (Def^ 4 ) - if there exist two mappings b\ from T to T' and 62 from T' 
to T such that the three conditions (^ 1 - 3 ) - respectively the four conditions (^ 1 ^) - 
below are satisfied and such that we have, for any T G T: /(T) = h2{f {hff'T))) . 

1 . 61 preserves inclusion: 

for any Ti,T 2 inT, if Ti C T2, then 61 (Ti) C bi{T2), (^ 1 ) 

2 . b\ o &2 is contractive on the set /'(&i(T)): 

&i(&2(/'(6i(T)))) C /'(6i(T)) for any T G T, (^ 2 ) 

3 . b20 f o b\ is extensive: for any T G T, T C b2{f{bi(T))). (^ 3 ) 

4 . 62 preserves inclusion on the set /' (61 (T) ) : For any Ti , T2 in T, 

if /'(6i(Ti)) C f(6i(T2)), then &2(/'(6i(Ti))) C b2{f{bffT2)))- (T 4 ) 

(^ 3 ) means that / = 62 o /' o 61 is a pre-circumscription. Notice that we need only 
to know the value of 62 on the subset /'(6i(T)) = {f'(bi{'T)) / T G T} of T'. 

The following preservation results are immediate: 
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Proposition 5.1. 1. If f is a pre-circumscription defined in a language \J which 

satisfies ( CUMU) - resp. (LOOP) - and if f is defined from f by (Def^), then f is 
a pre-circumscription defined in L which satisfies ( CUMU) - resp. (LOOP). 

2. If f is a pre-circumscription defined in a language \J which satisfies ( CT) - respec- 
tively ( CM)- and if f is defined from f by (Def^4), then f is a pre-circumscription 
defined in L which satisfies ( CT) - respectively ( CM). □ 



6 A Useful Simplified General Preference Relation 

Definition 6.1. [7] Let / be a pre-circumscription. We define the following general 
preference relation 1. S = f(T) = {/(T) / T G T}, 

2. 1 is the mapping from S to T defined hy l{f{T)) = f{T) for any T G T. 

3. /(Ti) /(T 2 ) if /(T i) fir-i) and there exists Ta G T such that 

/(ri) = /(r3)andT3c/(r2). 

The set /(T) is then the set denoted by /(S) in Definition 3.3 for the general preference 
relation defined here. The relation is introduced in [7, Theorem 3.25] in order to 
prove “the hard part” of Theorem 4. 1-3 for (CUMU). The relation can be replaced 
by a simplified general preference relation (see also [1,2]): 

Definition 6.2. Let be a general preference relation (defining thus a set S and a 
mapping V). We define the following simplified general preference relation -<s'. for any 
Ti,r 2 GT,Ti ^sT2 if l.Ti = rfi(-L) andT 2 ^ /(S) U{Tfi(_L)},or 

2. Ti = l{si), T 2 = l{s 2 ) Th{-L), and si S 2 , for some si, S 2 in S. 



Proposition 6.1. If a general preference relation -<g is such that the mapping I is injec- 
tive, we have, for any T G T, W^g(T) U {T/i(_L)} = W^^(T) U {Tfi(_L)}. Thus we 
have = f^^ where -<a is the simplified general preference relation defined from -<g 
as in Definition 6.2. 

Proof: As I is injective, for any si,S 2 in S, si -<g S2 iff there exist Ti and T 2 in 
/(S) = /(S) suchthatsi = l{Ti),S 2 = 1{T2) andTi -<s T 2 - Moreover rfi(_L) T 
for any T ^ ((S), and Th{lf) G W(T) for any T G T. Thus, for any T G T, we have 
W^^(T) U {T/i(_L)} = W^^(T) U {T/i(_L)}. As Tfi(_L) G W((p) for any v? G L, 
we get that if and -<2 are two general preference relations such that W^^(T) = 
W^ 2 ('T) U {T/i(_L)}, then f^^{T) = Thus we get here = A,. □ 

Definition 6.3. The mapping I of the relation is injective. We can thus consider 
the simplified general preference relation, that we call A„/, defined from ^klm 
Definition 6.2. We call ^nf the normal general preference relation associated to f. □ 

We get kim — f ^nf Irom Proposition 6.1. 

As U(L) is finite, we will now generally replace T by L. W(v3) will be a set of 
formulas, any simplified general preference relation will be a binary relation in L and, 
if / is a pre-circumscription, f{ip) = ip will replace f{(p) = Th(ip). 
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Proposition 6.2. If f satisfies (CT), the normal general preference relation -<nf asso- 
ciated to f is the binary relation described as follows: for any pi, tp 2 in L, 

-<nf ‘P 2 iff 1- ‘Pi = -L and p 2 ^ pfor any (/? G L, or 

2. ip 2 7 ^ -L, Pif^ P 2 and there exist such that f{pf) = pi, fipi) = P 2 , P 2 |= Pi- 

Proof: This is a consequence of Definitions 6.1 and 6.3, taking into account 
two peculiarities of Firstly, the set 1{S) = /(L) associated to the general 

preference relation contains _L: as / is a pre-circumscription, we have /(_L) = _L. 
Secondly, we have never _L p^'^^ p. Indeed, _L iff /(_L) ^ /(<f) 

there exists G L such that f{pi) = _L and f{p) |= pi. From (CT) we get then 
fifip)) = fip) h /('Fi). i-e- f lp) = -L = /(-L): a contradiction. □ 

These results show that all the general preference relations considered in [7] could 
have been replaced directly by a simplified general preference relation. 

Proposition 6.3. If f is a pre-circumscription satisfying (CUMU), then it is a general 
preferential entailment which can be defined by f = /-;„/• 

More precisely we have, for any p G L.- {p) = {f{p ) ; -L}- ^ 

We omit the proof, as it is an adaptation of a proof given in [7, proof of Theorem 3.25], 
establishing that we have in this case W^kim{p) = {f{p)}. The fact that we use a 
simplified general preference relation simplifies even the matter. Notice also that, as in 
[7, proof of Theorem 3.25] for we get that in this case Pnf is (sf). 

Here is another result extrapolated from [7], which will be useful in our translation 
of some general preferential entailments in terms of circumscription (cf the proof of [7, 
Theorem 4.9], which gives the result for what concerns p^'^^ and its transitive closure): 

Proposition 6.4. A pre-circumscription f satisfying (CUMU) satisfies (LOOP) iff the 
transitive closure -<„f of the normal general preference relation -<„f associated to f 
is irreflexive. In this case, i.e. if f satisfies (LOOP), we have (p) = ^pyyp{p) = 
{/(‘F), -L}, thus f = f^^j = fppy. □ 

7 Finite General Preferential Entailments as Circumscriptions 

Theorem 7.1. A pre-circumscription f in L satisfies (LOOP) iff it can be ex- 
pressed by (Deff^) — or by (Deff^4) — from a formula circumscription f = 
CIRCF(<P', 0, V (L')) defined in a language \J . 

By Proposition 6.4, “A pre-circumscription /” could be replaced by “A general 
preferential entailment /”. Remind a similar result for multi preferential entailments 
satisfying (CM) ([11, Theorem 31], extrapolated from [4, Theorem 15]). The reason 
why we need (LOOP) here instead of just (CUMU) is that we must get a strict order 
relation in order to get a formula circumscription (see Theorem 4.1, points 3, 5 and 7). 

Constructive proof: (if): Any formula circumscription /' satisfies (CUMU) and (LOOP) 
from Prop. 4.1-4 and Th. 4.1 (-6,7). Then / satisfies (LOOP) from Prop. 5.1-1. 




538 



Y. Moinard 



(only if): / = from Proposition 6.4, being described in Proposition 6.2. ^„f 
is a strict order from Proposition 6.4 and in fact this proof works for any simplified 
general preference relation -<s such that / = and which is a strict order. We define 

(1) a language \J such that there exists a one-to one mapping p from M to y (L') and 

(2) a one-to one mapping b from P(M) to = V{V (L')): 

Forany/xC y(L), p(/x) = G y(L'). (1) 

For any Mi C M, 6(Mi) = p(M - Mi) = {P'^ G y(L') / /x G M - Mi}. (2) 

Then, we define (3) a one-to-one mapping b from L to 
L'c = W G V/Th'{^') G C'l = {Ap,eP' P' ^ Ap-6V(lo-p' C ^(L')} 

(L'q is the subset of L' corresponding to in the same way than \! corresponds to T') 
and (4) a mapping bi from L to \! . For any (/? G L: 



6(A = 




< 

< 


A • 


(3) 






/lt6M-M(v) / 


\p;gy(l')/,xgm(v) / 




6i(A = 


A 


p ' . 




(4) 



lieM-Mtvs) 



Thus, M'( 6 ((^)) is the singleton { 6 (M(i^))}, where 6 (M((^)) = {P^//x G M — 
M((/?)} = {P'^ /m G M(-ii^)}. Here is a feature of these mappings, which greatly 
simplifies the translation: for any Mi, M 2 C M: Mi C M 2 iff 6 (M 2 ) C &(Mi), i.e., 

for any (p, x/) in L, |= x/> iff &(M(A)) C 6 (M((/?)). (5) 

From (4), bi{(p) is the formula which has the set {/x' / 6 (M((^)) C /x' C y(L')| for 
set of models. Thanks to (5), we get that bi{(p) is the formula such that M'(6 i(</5)) = 
{ 6 (M(A)) / A G W((^)|: bi{(fi) is an image of the set W((/?) in L,'. As bi is injective, 
it defines a one-to-one mapping between L and the set &i(L) = {&i(v?) / p G ~L} = 
{Ap/gp/ P' / V CV (L^)| of all the conjunctions of atoms of\J. 

We must now come back from \J to the original language L. 

The one-to-one mapping b~^ from M' to P(M) can be described as follows (cf (2)): 

b-\p') = {h-\pD / p; G y(L') - p'l = {piiPl€ y(L') - p'}. 

We define the mapping 62 from L' to L hy the following two equivalent equations: 

for any A GL', & 2 (A) = f\ P')- ( 6 ) 

P'GY(L'), v'h'P' 

M(&2(A)) = b-\{Pl / A K p;i = {m G M / A p;i). (7) 

From (4) and ( 6 ) we get, for any 1 ^ G L : h 2 {bi{(p)) = p. ( 8 ) 

The restriction of 62 to the subset of \J is (h)~^, a one-to one mapping from \Jq 
onto L. Indeed we get, P' ranging over V (L'): 

ifAGL'c, then A = A P' ^ A ~"P' ' 

v'h'P' v'WP' 



(9) 
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If if' G \Jc, then M'(v 3') is the singleton = {^'} for = {P' G V{\J)/ip' h' 

P'}, and we get: M( 62 (v 3 ')) = 

It is convenient to introduce the “exhaustive conjunction” ip' = Ap'sp(L') P - 
We suppose here that (p' = V p)' , with G L^. This means that there exists a 
subset P' of V (L') such that p' = Ap'ev(L')-P' ^ (Ap'eP' ^ Ap'ep' -PO- 

We get: if p' = V ip' with G Lj^, then 62(A) = (6)~^(Ai) ( 10 ) 

From ( 7 ), we get that 62 preserves V : 62(^1 V = ('2(^1) V 62(A)- (H) 

We get then, reminding 62 = (6)"i on L'^: 62(A) = Vc^'eL^. 

We define the following preference relation -<' on M' (remember section 2 for 0 ): 

for any A 5 in M', p' v' iff 0(6“^(p')) 0(6“^(p'))- (12) 

Thus is the image in M' of the relation on L. It is a strict order and there exists a 
set <P' of formulas in L' such that /^/ = CIRCF{(P' , 0 , V^(L')) (cf Theorem 4 . 1 - 7 ). 

We know from ( 5 ) and ( 4 ) that bi{ip) has for set of models the set associated to the 
set W(v3) by 6 (or 6 if we consider Lc instead of M). Thus, {(f) is the reverse image 

of the setMA(6i(A): W^AA = (6)-A^?'(MA(6 i(A))) = {(&)-AAc)/ Ac e 

\Jq,M!{(p'^ = {A} with A G MA( 6 i(v 3 ))}. From Definition 3.3 we have, for any 
V? G L, AAA = V^i6w^, ifi. We get thus, from the definition of -<'\ for any 
A G the only model A of A In M^/(6i((/?)) iff the formula (6)“^(A) is in 
W^A'F)- ^s a is in L(., we get (see ( 9 )): (6)“^ (A) = 62(A)- 8®t then 

f \/ (ip) \/ with tA' — and (t>i((y3)) ' 

From (11) we get AA<f) = (»2(V^'6L'^ with m'(^')={^'} and p'eMp,{bi{v)) A)- 

Wegetthen AAA = ('2(/A(A(A))- 

If we choose -<nf as our -<g, we get = {-L, f{(p)} from Proposition 6 . 4 . 

Thus, M^/ (61 ((/?)) as at most two elements, V (L') and the subset of A of V (L') which 
is the only other model of f^>{bi{(f)), if there is another model. As moreover we 
can apply (10) in this case, these peculiarities greatly simplify the effective computation. 

It remains to check the conditions. (^ 1 ): (f \= ip iff M((/?) C M(A) iff M — M(A) C 
M — M((^) and, from ( 4 ) we get that if M — M(A) C M — M((p), then 6i((/?) \=' 61(A). 

(t=^ 2 ): We prove that 61 062 is weakening on \J . For any (p' G L', we get 61 (62 (t/sO) = 
Ap'61/(l'). v'h'P' and (6), thus A K 6i(62(A))- 

(^ 3 ): For any G L, we get M(62(/'(6i(A))) = G M / /'(61(A) W P'p} 
from ( 7 ). As /' is a pre-circumscription, we have f'{bi{p)) bi{p), thus we get 
M(62(/'(6 i(</5)))) C G M / bi{p) A' Pp}- Now we have M( 62 ( 6 i(A)) = 
{p G M / 6i(A W Pp}- Thus we get M{b2{f {bi{p)))) C M(62(6i((f))). i-e- 
62(/'(6 i(A)) h 62(6i(A).i-e-, from(8): 62(/'(6i(A)) h -F- 

(t=^ 4 ): From M(62(A)) = {m G M / A P'fi} O) we get that, if A \=' Ihon 
we have 62(A) \=' 62 (AO- 
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As the four conditions are satisfied, the translation preserves (LOOP) and also (CM) 
and (CT) (Proposition 5.1). The preservation of (CT) is interesting: from Theorem 4.1-2, 
if /' is a general preferential entailment, and if / is defined from /' as here, then / is 
a general preferential entailment. Thus this translation preserves the main properties 
which can be preserved in this case. Notice finally that, as the proof of the “if side” does 
not require condition (?^4), we can formulate the theorem with or without (^4). □ 

A consequence of this proof is the following result: 

Corollary 7.1. A pre-circumscription satisfies (LOOP) iff it is a general preferential 
entailment defined by a simplified general preference relation which is a strict order. □ 

We do not know what happens if V (L) is infinite. A characterization of formula 
circumscription is known, but (sf) alone is not enough [12]. Moreover, bi, 62 should be 
defined for each T G T and not only for (p € L, which would complicate the matter. 

Notice that we could use a slightly smaller vocabulary L', starting from the set /(L) 
instead of the set L, and from instead of -<nf- However, this would complicate 
very seriously the definitions of 61 and 62 and we would loose the main advantage of 
our translation, the easy and natural definitions of bi and 62- 

The characterization result extends as follows to general preferential entailments: 

Theorem 7.2. A pre-circumscription f in L satisfies (CT) iff it can be expressed by 
(Defi=^4) from a preferential entailment f' = f^i defined in a language L^ 

Proof: Notice that V (L) must be finite, as for Theorem 7.1 and its corollary. 

iff Preferential entailments satisfy (CT), thus / satisfies (CT) from Proposition 5.1, 
notice however that (Def^) would not suffice here. 

only if: If / satisfies (CT), it is a general preferential entailment defined e.g. by the 
simplified relation ^ j introduced in [10, Definition 5.7]. We define L', b, bi, 62 and the 
preference relation -<' in \J as in the proof of Theorem 7.1, -<' being defined from ^ / 
exactly as in (12) from -<g. From the properties of bi and 62 (mainly from (11)), we get 
then, as in the proof of Theorem 7.1: f{ip) = b 2 {f^'{bi{(p))) for any (/? G L. □ 

This result adapts to finite general preferential entailment the characterization result 
[13, Theorem 4.8 and Preservation result 6.21] showing how to express any finite multi 
preferential entailment as a preferential entailment in a greater language. 



8 A Detailed Example 

Example 8.1. V (L) = {P}, / = where <g is defined by _L P and _L <g -iP. 

We get f{p) = _L if G {-L, P, ^P} and /(T) = T and also ~<g= ^ =<nf= <nf- f 
falsifies (CR): f{P V ~^P) f{P) V f(y^P). Thus f is one of the simplest examples of 
a general preferential entailment which is not a multi preferential entailment. It is easy 
to check that / satisfies (LOOP) here, thus also (CUMU) (cf Theorem 4.1-3). 

As / satisfies (LOOP), we apply Theorem 7.1, defining from -<.s=<g= -<nf- We 
define p and V(\J) as follows: p(0) = Pq, _p({P}) = P[, V{\J) = {Pq,P{}, getting 
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Table 1. Computation of bi [and of W(^) and W^g((p)] for each </9GL 



L 


P(M) 

M{p) 


M' 

6(M(v9)) 


Lb 

b{p) 


&i(L) C L' 
blip) 


[G P(L)] 
W(^) 


[G P(L)] 

w^,(v^) 


T 

P 

-nP 

_L 


{{P}} 

{0} 

0 


0 

{Pi} 

{PI} 

{Pi,P{} 


--Pi A -Pi' 
Pi A -P[ 
-Po A Pi 
Pi A Pi 


T 

Pi 

Pi 

Pi A Pi 


{T,P,-P,T} 

{P,P} 

{^P,P} 

{T} 


{T,T} 

{T} 

{T} 

{T} 



M' = {0, {/g}, {^i}, {i^o, P{}}- Table 1 describes b and bi. We get then described 
as follows in M': {P^, P{} {P^, P{} {P{}. 

WegetW^/(T) = {T, _L} and = {_L}forv 3 C {P,-P,_L}. 

Using the method given in [15], we get a set tP' of formulas to circumscribe: We define 
the greatest pre-order (reflexive and transitive relation) on \1 , satisfying {p! v' 
iff p' v' and not v' ^ p'): {Pq,P[} {Pq}, {Po,P{} {P{}^ {Pq} 

{P{}, {P{} {Tq} and p' p' . Then for each p' G M', we define the formula 
(p'{p') G \! having for set of models p' and its successors for -P , getting a set = 
{p'{%),p'{{Pl,}),ip'{{P[,,P[})} such that f = fp = CIRCF(^',9,V{V)) (F is 
optimal in cardinality for describing /' as a formula circumscription): 



p'(m') 


M'ip'ip')) 


P'( 0 ) = ~'Pi ^ ~'P{ 
p'i{Pi}) = p'i{Pi})=^{Pi^Pi) 
p'{{Pi,P{})=PiVP{ 


{ 0 } 

{{Pi},{P{}} 

{{Pi},{Pi},{Pi,P{}} 



As we get = -hp'{{Pq, P{}), the formula ip'{^) (or equivalently -k/?'( 0)) is 
“fixed” in the circumscription [5], which can help the computation. It is easy to check 
that this is always true (adding disjunctions of formulas to a set does not modify the 
circumscription of the set of formulas [15]): the formula associated to the set of models 
M' — M'(&(/(L)) is always obtained by the construction, while the formula associated 
to the complementary set M'(6(L)) is the disjunction of the other formulas obtained. 

Table 2 describes /' and 62 - Only the framed values are used by the method. The 
first column gives the formulas tp' G L' (shortly framed when p' is in the set 61 (L), i.e. 
is a conjunction of atoms). The second column describes /' = CIRCF{d>' , 0, V (L')): 
in fact, we only need the (framed) values of f'(p') for the four values in bi (L). The next 
three columns give respectively M.' {p'), M(62(‘/?0) the formula b 2 {p') G L (we 
need only to consider the two formulas p' in the set /'(&i (L)), framed in the /' column, 
we have made this apparent by long frames in the p' and 62 columns). 

From the values of bi{p) for the four 1^ G L (Table 1), we compute b^if {bi{p))) , 
and check that we get indeed b2{f'{bi{p))) = f{p) [f{p) = Vv>gw^^(v) 



9 Conclusion and Perspective 

We have extended the “expressive power of circumscription”, by showing that not only 
cumulative multi preferential entailments as shown by Costello [4], but also general 
preferential entailments satisfying (LOOP), can be translated into circumscriptions in 
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Table 2. Theorem 7.1 applied to example 8.1 (only the six framed computations are used) 



L' 

y' 


[G L'] 

f'iv') 


P(M') 

M'((p') 


[e P(M)| 

M(&2(v?')) 


[€ L] 
b2{ip') 




T 




Pi^Pi 


{0, {Pi}, {Pi}, {Pi, Pi}} 




T 




Po V P[ 




Pi A Pi 


{{Pi}, {Pi}, {Pi, Pi}} 




T 




Pi V ^Pi 




Pi^Pi 


{0, {Pi}, {Pi, Pi}} 


{0,{P}} 


T 




--Pi V Pi 




Pi^^Pi 


{0, {Pi}, {Pi, Pi}} 


{0,{P}} 


T 


r 

> 

r 


,/ 

1 


--Pi V --Pi 


{0,{Pi},{Pi}} 


{0,{P}} 


T 




Pi 




Pi A Pi 


{{Pi}, {Pi, Pi}} 




P 




Pi 




Pi A Pi 


{{Pi}, {Pi, Pi}} 


{0} 


-^P 


Po ^ P[ 


Pi^Pi 


{0,{Pi,Pi}} 


{0,{P}} 


T 




Pi ^ Pi 




Pi^Pi 


{{Pi}, {Pi}} 




T 




--Pi 




--Pi 


{0,{Pi}} 


{0,{P}} 


T 




--Pi 




--Pi 


{0,{Pi}} 


{0,{P}} 


T 




Pi A Pi 




Pi A Pi 


{{Pi, Pi}} 


{0} 


_L 




Pi A --Pi 




Pi A --Pi 


{{Pi}} 


{{P}} 


P 




--Pi A Pi 




--Pi A Pi 


{{Pi}} 


{0} 


-nP 


r 

< 

r 


,/ 

1 


J 

> 

J 

7: 


{0} 


{0,{P}} 


T 




_L 




_L 


0 


0 


_L 



another vocabulary. These various kinds of preferential entailment are introduced in 
Kraus and al. [7]. In order to achieve this translation, we needed two results. Firstly, the 
notion of general preferential entailment, as introduced in [7], is overly general [10]: we 
do not need copies of theories (or equivalently, of sets of interpretations). We can define 
the relation in the simpler set of the theories. Doing this, we have simplified some results 
in [7]: cumulative inferences correspond to general preferential entailment defined by 
a simplified relation satisfying (sf), also known as “smooth” (a result already given in 
[1,2] in much more complex ways). Secondly, we have described a modification of the 
vocabulary which allows to transpose any general preference relation among theories into 
a preference relation among complete theories (or among interpretations). This method 
needs a huge auxiliary vocabulary, however, only a very simple, and small, subclass 
of formulas in the new vocabulary (the conjunctions of atoms) needs to be considered. 
Moreover, the translation formulas from the old vocabulary to the new one and back are 
easy to compute. Thus, the method should be really applicable. 

These results should have applications in helping the automatic computations of 
non monotonic formalisms. The modification of vocabulary introduced here could have 
other applications, as it is rather general, and relatively simple. Also, the simplification 
of the originally overly complex notion of general preferential entailment should help 
future studies on the subject: it is much easier to work with relations among theories 
that with relations among arbitrary sets of copies of theories. Finally, the translation 
results given here should also have real applications. This is obvious for the result 
allowing to translate any finite general preferential entailment satisfying (LOOP) into 
a circumscription. Indeed, the work on automatic computation of circumscription is 
still very active, and our work shows that any progress could be applied, not only to 
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cumulative preferential entailments, as already known, but also to the strictly more 
general notion of general preferential entailment satisfying (LOOP). Also, the result 
showing how to express any finite cumulative general preferential entailment (a yet 
strictly more general notion) in terms of preferential entailment (where the relation is 
directly among interpretations) should have applications, since the notion of ordinary 
preferential entailment is simpler and more studied than the notion of general preferential 
entailment. 

More studies are needed in order to apply these computations. Moreover, we are 
still waiting for efficient ways of computing ordinary preferential entailments, or even 
formula circumscriptions. At least we know now that not only multi, but also general, 
preferential entailments, would benefit from these demonstrators. 
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Abstract. Quasi-classical logic (QC logic) allows the derivation of non- 
trivial classical inferences from inconsistent information. A paraconsis- 
tent, or non-trivializable, logic is, by necessity, a compromise, or weak- 
ening, of classical logic. The compromises on QC logic seem to be more 
appropriate than other paraconsistent logics for applications in comput- 
ing. In particular, the connectives behave in a “classical manner” at the 
object level so that important proof rules such as modus tollens, modus 
ponens, and disjunctive syllogism hold. Here we develop QC logic by 
presenting a semantic tableau version for first-order QC logic. 



1 Introduction 

Paraconsistent reasoning is important in handling inconsistent information, and 
there have been a number of proposals for paraconsistent logics (for a review 
see [Hun98]). However, developing non-trivializable, or paraconsistent logics, ne- 
cessitates some compromise, or weakening, of classical logic. Key paraconsistent 
logics such as C^i [dC74] achieve this by weakening the classical connectives, par- 
ticularly negation. However this results in useful proof rules such as disjunctive 
syllogism failing, and intuitive equivalences such as -'a V /? = a — >■ /3 failing. 

An alternative, called quasi-classical (or QC) logic, is to restrict the proof 
theory [BH95,Hun00a]. In this restriction, compositional proof rules (for exam- 
ple, disjunction introduction) cannot be followed by decompositional rules (for 
example, resolution). Whilst this gives a logic that is weaker than classical logic, 
it does mean that the connectives behave classically at the object level. We 
believe the logic is appealing for reasoning with inconsistencies arising in appli- 
cations such as systems development [HN98], and for reasoning with structured 
text [HunOOb]. 

In this paper, we present a first-order version of paraconsistent logic. First we 
give the semantics for the first-order language and then give a semantic tableau 
version of the proof theory. 
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2 First-Order QC Logic 

First-order QC logic is a development of QC logic as defined in [HunOOa]. We 
assume the usual classical definitions for the language including definitions for 
a free variable, a bound variable, a ground term, and a ground formula. 

Definition 1. The language of first-order QC logic is that of classical first-order 
logic. We let C denote a set of formulae formed in the usual way from a set of 
predicate symbols, a set of function symbols, a set of variable symbols, and the 
connectives^ {-i, V,A}. We also assume there is at least one zero-place function 
symbol in the set of functions symbols. 



Definition 2. Let a be an atom, and let ^ be a complementation operation 
such that ^ a is ->a and ~ (“’«) is a. The ^ operator is not part of the object 
language, but it makes some definitions clearer. 



Definition 3. Let V..Va„ be a clause that includes a literal disjunct The 
focus of ai V .. V by ai, denoted C)(q;i V .. V an, at) is defined as the clause 
obtained by removing ai from oi V .. V Ln the case of a clause with just one 
disjunct, we assume C)(ai,ai) = _L. 



Example 1. Let a V /3 V 7 be a clause where a,j3, and 7 are literals. Hence, 
C)(a V /3 V 7 , /3) = a V 7 . 

The notion of a model in first-order QC logic is based on a form of Herbrand 
interpretation. 

Definition 4. The Herbrand universe of C is the set of ground terms in C 
and is denoted U {C) . The Herbrand base of C is the set of ground atoms in 
C formed using the Herbrand universe of L and is denoted B{L). 



Definition 5. Let B{C) be the Herbrand Base of £. Let 0{C) be the set of 
objects defined as follows, where -\-a is a positive object, and —a is a negative 
object. 

0{L) = {+a\a€ B{L)} U {-a | a G B{L)} 

We call any E G p{0{C)) a model. So E can contain both -\-a and —a for some 
ground atom a. 

We can consider the following meaning for positive and negative objects being 
in or out of some model E, 

-\-a G E means a is “satisfiable” in the model 
—a G E means is “satisfiable” in the model 
-\-a ^ E means a is not “satisfiable” in the model 
—a ^ E means ^a is not “satisfiable” in the model 
^ To provide a succinct presentation, we do not consider implication here. 
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Since we can allow both an atom and its complement to be satisfiable, we 
have decoupled, at the level of the model, the link between a formula and its 
complement. In contrast, in a classical model, if a model satisfies a literal, then 
it is forced to not satsify the complement of the literal. 

We formalise the notion of satisfiability and extend it to the rest of the 
language using the following definitions. 

Definition 6. An assignment is a function from the set of variables used in C 
toU{C). 



Definition 7. Given an assignment A, an X-variant assignment Af is the same 
as A except perhaps in the assignment for X. 



Definition 8. For an assignment A, terms in C are interpreted as follows, 
where is a function from the terms in C to U{C). 

[c]^ = c where c is a constant symbol. 

[X]"^ = A{X) where X is a variable symbol. 

[f{ti, ..,tn)]^ = f{[ti ]^, .., where f is a function symbol 

and ti,..,t„ are terms. 

In this, each ground term in C is interpreted as the equivalent term in U (£) . 
Hence the interpretation of terms is independent of the choice of model. 

Definition 9. For an assignment A, an atom a{ti, ..,t„) in C is interpreted as 
follows, where \=h is a satisfiability relation called Herbrand satisfaction. 

{E, A) '^h a{ti, tn) iff +a{[ti]^, .., G E 

{E,A) iff -a{[ti]^,..,[tn]^) & E 

Using Herbrand satisfaction, we define two further satisfaction relations, 
namely strong satisfaction and weak satisfaction, that allow us to define an 
entailment relation. Essentially, the equivalences in strong satisfaction allow for 
any formula in C to be rewritten into a conjunctive normal form, and then into 
clauses, which can be evaluated with respect to the objects in the model. In 
addition, the definition for disjunction captures a form of resolution in the se- 
mantics. This is needed because the classical relationship between a positive and 
negative literal has been decoupled. 

Definition 10. Let be a satisfiability relation called strong satisfaction. 
For a model E, and an assignment A, we define |=g as follows, where a\, ...,«„ 
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are literals in L, and a is a literal in L. 

{E,A) a ijf{E,A) \=h a 

(E,A) ai V ... V a„ 

'iff [{E, A) \=s ai or ... or {E, A) \=g a„] 
and \/i s.t. 1 < i < n 

[(E,A) implies (E,A) (g)(ai V ... V a„, a^)] 

For a, € L, we extend the definition as follows, 

{E, A) \=s a A P iff {E, A) \=s a and {E, A) \=^ P 
{E, A) \=a -'-'Q; V j iff {E, A) |=s a V 7 
{E, A) \=s -•(« A P)\/ j iff {E, A) -iQf V -1/3 V 7 
{E, A) \=s -•(« y P)\/ ^ iff {E, A) \=s (-•« A -1/3) V 7 
{E, A) \=s ay {P A 7) iff {E, A) \=s (a V /3) A (a V 7) 

{E, A) \=aaA{py 7) iff {E, A) \=s (a A /3) V (a A 7) 

(E,A) \=s {3Xa) y P iff for some X-variant assignment A' , {E,A') |=g a V /3 
(E,A) \=s (yXa) y P iff for all X-variant assignments A', {E,A') \=g ay P 
(E,A) \=s {^3Xa)y P iff for all X-variant assignments A' , {E,A') \=s~>ayp 
(E,A) \=s {-<yXa) V P iff for some X-variant assignment A' , (E,A') \=s~'ayp 

The definition for weak satisfaction is similar to strong satisfaction. The 
main difference is that the definition for disjunction is less restricted. Note, 
distributivity is implied by the definition of weak satisfaction. 

Definition 11. Let \=w be a satisfiability relation called weak satisfaction. 
For a model E, and an assignment A, we define follows, where Oi, 

are literals in C and a is a literal in C. 

{E, A) a iff {E, A) |=/j a 

{E, A) oi V ... V an iff [{E, A) \=n, oi or ... or {E, A) a„] 

For a, P,^ € L, we extend the definition as follows, 

{E, A) a A P iff {E, A) \=n, a and {E, A) /3 

{E, A) -i-io y y iff {E, A) \=u, ay y 

{E, A) -i(a A P)y y iff {E, A) -.a V -i/3 V 7 

{E, A) -i(a y P)y y iff {E, A) (-la A ^P) V 7 

(E,A) (3Xa) y P iff for some X-variant assignment A' , {E,A') |=i„ ay P 

(E,A) (VXa) y P iff for all X-variant assignments A' , {E,A') ay P 
(E,A) {-^3Xa) y P iff for all X-variant assignments A', (E,A') V/3 

(E,A) {-''\/Xa)y P iff for some X-variant assignment A' , {E,A') ^u,-iaV/3 
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Example 2. Let A = {a{a),^{h),'iX{-^a{X) V /3(X))}, 
and let B{C) = {a(a), /3(a), 7(a), a(&),/3(6), 7(6)}. 

Now let A= {X !->• a} and E = {+a(o), +/3(a), +7(6)}. 
This gives the following. 



{E,A) \=s a{a) 

(E,A) K 7(&) 

{E,A) K VX(-a(X) V /3(X)) 



Example 3. Let A = {3X,Y a{X,Y), P{f{a))}, 

and let B{C) = {a(o, a), /3(a), a(/(o), a), /3(/(a)), a(/(/(a)), o),/3(/(/(a))), 
Now let A = {X I— >■ a, y 1— >■ a} and E = {— a(a, a), +/3(/(a))}. 

This gives the following. 

{E,A) K /3(/(a)) 

(E,A) K 3Xf3{X) 

{E,A) 3X,Ya{X,Y) 



Example 4- Let A = {VA/(-'a(Ar), o;(a))}, 
and let B{C) = {a(a)}. 

Now let A = {X !->• a} and E = {+a(o), — a(a)}. 

This gives the following. 

{E,A) Qf(a) {E,A) |=s -ia(a) 
{E, A) a{a) V /3(5) {E, A) a(a) V /3(&) 



Definition 12. We polymorphically extend strong satisfaction and weak satis- 
faction as follows 

E \=s a iff for all assignments A, (E,A) |=g a 
E |=iu a iff for all assignments A, (E,A) |=i„ a 

In the following definition, we can see that QC entailment is of the same form 
as classical entailment except we use strong satisfaction for the assumptions and 
weak satisfaction for the inference. 

Definition 13. Let be an entailment relation, called the QC entailment 
relation, such that ^qC p{L) x L, and defined as follows. 



{oi, .., cXji} |=(3 P 

iff for all E if E \=s «i and ... and E then E \=„, j3 

We can consider the strong satisfaction relation as capturing the decompo- 
sition of the set of asumptions. Strong satisfaction forces each resolvent /3 of a 
clause aV /3 to hold if ~o; holds. In contrast, we can consider the weak satisfac- 
tion relation as capturing the composition of formulae from resolvents, allowing 
disjuncts to be introduced. 
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Example 5. The following illustrate QC entailment. 

{a(a), VX(-ia(X) V -ia(X))} ->a{a) 

{a{a),yX{->a{X) V PiX))} \=q /3(a) 

{a(a)} \=Q 3Xa{X) 

We can show that \=q is non-trivializable in the sense that when A is clas- 
sically inconsistent, it is not the case every formula in C is entailed by A. 

Example 6. Let /3, -i/3 and a be ground literals in C. So {/3 A -i/3} is classically 
inconsistent. However it is not the case that {/3 A -i/3} \=q a holds, since E = 
{+P, — /3} is a model where E \=s P A ->P, but E a. 

However, many classical tautologies do not follow with \=q. In particular, 
the classical tautologies do not follow from an empty set. 

Example 1. Let Z\ = 0. Now consider the classical tautology a V -■q;. Here A |=g 
a V -la does not hold. Since E = % strongly satisfies every formula in A, but E 
does not weakly satisfy a V ~^a. 

This is one illustration of how QC logic is weaker than classical logic. A 
number of further features of classical logic such as left logical equivalence, con- 
ditionalization, cut, and right weakening also fail [HunOOa]. 

3 The QC Semantic Tableau 

In order to provide an automated proof procedure, we adapt the tableau ap- 
proach for classical logic that was developed by Smullyan [Smu 68]. For this 
adaptation, we need the following definitions. 

Definition 14. The set of signed formulae of C is denoted L* and is defined as 
L U {a* I a G £}. 

We will regard the formulae in C* without the * symbol as satisfiable and 
the formulae in C* with the * symbol as unsatisfiable. 

Definition 15. We further extend the weak satisfaction and strong satisfaction 
relations as follows where a & L. 

E \=s a* iff E a 
E \=u, a* iff E a 

Definition 16. For a formula a G C with free variable X, and a term t G U{L), 
we let a[X/f\ denote the substitution of all occurrences of X in a by t. 

In the definition of QC semantic tableau, there are two types of decomposition 
rule. The first type is represented by the S-rules given in Definition 17 and the 
second type is represented by the U-rules given in Definition 18. All the S-rules 
assume the formula above the line is satisfiable, and all the U-rules assume the 
formula above the line is unsatisfiable. 
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Definition 17. The following are the S-rules for a QC semantic tableau, where 
t is in U{L) and t' is in U{L) but not occurring in the tableau constructed so 
far. The \ symbol denotes the introduction of a branch point in the QC semantic 
tableau. 



ai V ... V a„ 

(~ ai)* I (g)(ai V ... V a„, at) 



where ai,..,a„ are literals 



ai V ... V a„ 
a± I ... I a^i 



where ai,..,a„ are literals 



a A /3 -i-ia V 7 

a, (3 Of V 7 



-■(a A / 3 ) V 7 -’(a V / 3 ) V 7 

-■a V -i /3 V 7 (-■a A -i/ 3 ) V 7 



a V (/3 A 7) a A (/3 V 7) 

(a V / 3 ) A (a V 7) (a A / 3 ) V (a A 7) 



(VATa) V 7 (“'EJA'a) V 7 ( 3 Xa) V 7 (“■VA^a) V 7 

(a[A'/t]) V 7 (-ia[AT/t]) V 7 {a[X/t'\) V 7 V 7 

IFe will refer to the first two rules as the disjunction S-rules, the next six rules 
as the rewrite S-rules, and the last four rules as the quantification S-rules. 

Definition 18. The following are the U-rules for a QC semantic tableau, where 
t is in U{L) and t' is in U{L) but not occurring in the tableau so far. The \ 
symbol denotes the introduction of a branch point in the QC semantic tableau. 

(a V f 3 )* {a A /?)* (-i-io; V 7)* 
a*,P* a*\( 3 * (aV7)* 

(-■(a A P)V 7)* (“'(a V / 3 ) V 7)* 

(-■a V -1/9 V 7)* ((-!« A -i/ 3 ) V 7)* 

((VXg)V7)* ((^3A:a)V7)* ((3Xa)V7)* ((^VXa)V7)* 

{{a[X/T]) V 7)* {{^a[X/T]) V 7)* {{a[X/t]) V 7)* {{^a[X/f\) V 7)* 

We will refer to the first rule as the disjunction U-rule, the second rule as the 
conjunction U-rule, the next three rules as the rewrite U-rules, and the last four 
rules as the quantification U-rules. 

Definition 19. A QC semantic tableau for a database A and a query a is a 
tree such that: (1) the formulae in Z\U{a*} are at the root of the tree; (2) each 
node of the tree has a set of signed formulae; and (3) the formulae at each node 
are generated by an application of one of the decomposition rules on a signed 
formula at ancestors of that node. 
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The classical definition for semantic tableau incorporates a similar definition 
to that of Definition 19. The major difference is that in the classical definition, 
the root has A U {“■«} where -^a is the negation of the query. The reason QC 
logic doesn’t use this is that we have decoupled the classical relationship between 
a formula and its complement. 

Definition 20. A QC tableau is closed iff every branch is closed. A branch is 
closed iff there is a formula (3 for which (3 and (3* belong to that branch. A tableau 
is open if there is an open branch. A branch is open if there are no more rules 
that can be applied, and it is not closed. 

The classical form of tableau also incorporates the definitions given in Defi- 
nition 20. In Proposition 8, below, we show that a database A implies a query 
a, by QC logic, if and only if a QC tableau for a database A and query a is 
closed. First we consider some examples in Figures 1, 2, and 3. 



a, (-.aV/3),d* 



-i-ia* P 

\ 

a* 





Fig. 1. Let A be {(-'aV/3), «}, and the query be /3- This gives the root {a, {-<a\/ P), P*}, 
and the tableau is closed. In this tableau, (->q V P) is decomposed to ~ -la* in the left 
branch, and ®(-iaV/3, ct) in the right branch, where = -i-ia* , and (g)(-iaV/l, “'«) 

= p. The final step in the left branch is to obtain a* from -i-ia*. 



4 Some Properties of the QC Semantic Tableau 

Each branch in a QC semantic tableau delineates a class of pairs of {E, A) where 
if is a model and A is an assignment. As we decompose the formulae in a branch, 
we refine the class of pairs {E,A). 

Proposition 1. Each tableau rule given in Definition 17 is sound in the follow- 
ing sense: If (f> G C* is the formula above the line, and G C* is the formula 
below the line, and E is a model such that E |=g (p, then E \=s f:. 

Proof. For any formula p G C at the root, if we assume that the formula is 
satisfiable according to the \=s relation, then we can also assume that any formula 
resulting from the decomposition of p using the S-rules is also satisfiable using the 
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7, 6, (-17 V -'(5 V a), {a V f3)* 



{-^S V a) 






a 



1 



Fig. 2. Let A be {7, S, (-17 V V a)} and let the query be a V / 3 . This gives the root 
{7, 5, ((^7 V “1(5) V a), (a V /3)*}, and the tableau is closed. 



a{g{a)),VX{^a{X) V P{f{X))),f3{f{g{a))r 

1 

^a{g{a)) V P{f{g{a))) 



Mg{a))* f3{f{g{a))) 



a(5(a))‘ 



Fig. 3 . Let A be {a{g{a)),'iX{-<a{X) V f3{f{X)))} and the query be (3{f{g{a))). This 
gives the root {a{g{aj),yX{-<a{X)\/ P{f{X))), P{f{g{a)))*}, and the tableau is closed. 



relation. We justify this for each of the S-rules as follows: (First disjunction 
S-rule) According to Definition 10, for all E,A, if {E, A) |=g oi V .. V a„, then for 
each ai either {E,A) or (E,A) \=g (8)(q;iV..Vq;„, a^). (Second disjunction 

S-rule) According to Definition 10, for all E,A, if (E,A) \=s oi V .. V a„, then 
(E,A) \=s ai or ... or (E,A) a„. (Quantification S-rules) Consider the rule 

with MXa above the line. Here, for all E,A, if (E,A) \=s ^Xa, then for any A' 
that differs at most in X, and hence for any t G U{C), (E, A') \=s a[X/t]. Now 
consider the rule with 3Xa above the line. Here, for all {E, A), if {E, A) |=g 3Xa, 
then for some A that differs at most in X, and here for some F, {E,A) \=g 
a[X/t'\. However, as we do not know which t' , we remain impartial, and we 
select some t' that has not yet been used in the tableau so far. The other two 
quantification rules follow similarly. (Rewrite S-rules) The soundness for these 
follow directly from Definition 10. 
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Proposition 2. Each tableau rule given in Definition 18 is sound in the follow- 
ing sense: If f € C* is the formula above the line, and -tp € C* is the formula 
below the line, and E is a model such that E \=yj f, then E \=^j ip- 

Proof. For any formula (p* at the root, if we assume that the formula is un- 
satisfiable according to the relation, then we also assume that any formula 
resulting from the decomposition of (p* using the U-rules is also unsatisfiable 
using the \=w relation. We justify this for each of the U-rules as follows: (Dis- 
junction U-rule) According to Definition 11, for all E,A, if {E, A) aV /?, then 
{E,A) Y=w 01 and {E,A) f3. (Conjunction U-rule) According to Definition 

11, for all E,A, if {E,A) a A (3, then {E,A) Y^y, a or (E,A) /?• (Quan- 

tification U-rules) Consider the rule with {3Xa)* above the line. Here, for all 
{E,A), if (E,A) ^y, 3Xa, then for any A that differs at most in X, and hence 
for any t G U{C), {E,A) ^y, a[X/t]. Now consider the rule with (VAa)* above 
the line. Here, for all {E,A), if (E,A) Y=w VAa, then there is a A that differs 
at most in A, and hence a, t' G U{C), such that {E,A') '^y, a[X/t']. However, 
we do not know which t' , and so to remain impartial, we select some t' that 
has not yet been used in the tableau so far. The other two quantification rules 
follow similarly. (Rewrite U-rules) The soundness for these follow directly from 
Definition 11. 

Proposition 3. The set of decomposition rules given in Definition 17 is com- 
plete in the following sense: If <p G C is a formula in a branch of a QC semantic 
tableau, and there is a pair (E,A) such that {E,A) \=s (p, and according to Def- 
inition 10 there is a derivation of the form (E,A) \=s (p implies {E,A) ip, 
then Ip can be obtained as a formula in the branch using the S-rules in Definition 
17. 

Proof. The strong satisfaction relation in Definition 10 is defined for non-literal 
formulae by eleven equivalences. The first, which is for disjunction, is captured 
by the two disjunction S-rules. The next six are captured by the six rewrite S- 
rules. The last four, which are the quantification rules, are captured by the four 
quantification S-rules. 

Proposition 4. The set of decomposition rules given in Definition 18 is com- 
plete in the following sense: If (p* is a formula in a branch of a QC semantic 
tableau, and there is a pair {E, A) such that {E, A) <p* , and according to Def- 
inition 11 there is a derivation of the form {E,A) \=y, (p* implies {E,A) \=y, ip*, 
then Ip* can be obtained as a formula in the branch using the U-rules in Defini- 
tion 18. 

Proof. The weak satisfaction relation in Definition 11 is defined for non-literal 
formulae by nine equivalences. The first, which is for disjunction, is captured 
by the disjunction U-rule. The second, which is for conjunction, is captured by 
the conjunction U-rule. The next three are captured by the three rewrite U- 
rules. The last four, which are the quantification rules, are captured by the four 
quantification U-rules. 
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Definition 21. Let B he a branch of a QC semantic tableau where no further 
decomposition rules can be applied. F{B) C £* is the set of all the formulae at 
the nodes in the branch. Also let S{B) = F{B)\AC and let U{B) = F{B) — S{B). 

Proposition 5. For any set of formulae A G p{F), and any formula a € C, 
and any branch B of a QC semantic tableau for a database A and a query a, 
the branch B is closed iff there is no model E such that E \=s 4> for all f € S{B) 
and E 4>* for all G U{B). 

Definition 22. If A G p(jC) is of the form {oi,. then /\A denotes the 
formula oi A .. A G C. 

Proposition 6. For any set of formulae A G p(E), and any formula a G C, 
there is a QC semantic tableau for a database A and a query a that is closed iff 
there is no model E such that E \=s f\ A and E ce* . 

Proof. First, according to Propositions 1 and 2, each application of a decompo- 
sition rule is sound. Second, according to Propositions 3 and 4, the application 
of the decomposition rules is complete. Now let us consider a particular Z\ and 
a. There is a QC semantic tableau for a database A and a query a that is closed 
Every branch of the semantic tableau with root A U {a*} is closed Every 
branch of the semantic tableau with root A U {a*} contains f) and P* for some 
ground literal P There is no model for each branch of the semantic tableau 
with root Z\ U {a*} <t4> There is no model E such that E \=s f\ A and E \=y^ a*. 

Proposition 7. For any set of formulae A G p(E), and any formula a G C, 
there is an open branch B of a QC semantic tableau for a database A and a 
query a iff there is a model E such that E \=s /\A and E \=^ oi* . 

Proposition 8. For any set of formulae A G p(jC), and any formula a G C, a 
QC tableau for a database A and a query a is closed iff A \=q a holds. 

Proof. This follows directly from Proposition 6. Let us consider a particular A 
and a. There is a QC semantic tableau for a database A and a query a that is 
closed <t4> There is no model Ei such that Ei \=s [\ A and Ei |=u, a* <t4> For all 
models Ej, Ej y=s [\Aox Ej \=.ui ct For all models Ej, if Ej then 

Ej |=u, a. 4=> Z\ \=Q a holds 

Proposition 9. The QC semantic tableau collapses to a classical semantic 
tableau if the following rules are added to the decomposition rules, 

a -la a* (“'Ck)* 

(-■a)* a* ->a a 

and we can use the classical definition for closure of a branch (i.e. the branch 
contains both P and ->P for some ground atom). 
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5 Discussion 

Developing a non-trivializable, or paraconsistent logic, necessitates some com- 
promise, or weakening, of classical logic. The compromises imposed to give QC 
logic seem to be more appropriate than other paraconsistent logics for appli- 
cations in computing. QC logic provides a means to obtain all the non-trivial 
resolvents from a set of formulae, without the problem of trivial clauses also 
following. 

QC logic exhibits the nice feature that no attention needs to be paid to a 
special form that the formulae in a set of premisses should have, as long as each 
formula in the set is individually consistent and not a tautology. This is in con- 
trast with other paraconsistent logics where two formulae identical by definition 
of a connective in classical logic may not yield the same set of conclusions. 
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Abstract. A great deal of research has been devoted to nontrivial rea- 
soning in inconsistent knowledge bases. Coherence- based approaches pro- 
ceed by a consolidation operation which selects several consistent subsets 
of the knowledge base and an entailment operation which uses classical 
implication on these subsets in order to conclude. An important advan- 
tage of these formalisms is their flexibility : consolidation operations can 
take into account the priorities of declarations stored in the base, and 
different entailment operations can be distinguished according to the 
cautiousness of reasoning. However, one of the main drawbacks of these 
approaches is their high computational complexity. The purpose of our 
study is to define a logical framework which handles this difficulty by 
introducing the concepts of anytime consolidation and anytime entail- 
ment. The framework is semantically founded on the notion of resource 
which captures both the accuracy and the computational cost of anytime 
operations. Moreover, a stepwise procedure is included for improving ap- 
proximations. Finally, both sound approximations and complete ones 
are covered. Based on these properties, we show that an anytime view of 
coherence-based reasoning is tenable. 



1 Introduction 

A great deal of research has been devoted to nontrivial reasoning from inconsis- 
tency. This problems arises in a number of areas in artificial intelligence, e.g., in 
merging knowledge bases [1,2], defeasible reasoning [16,5] and belief revision [13, 
14]. Most of the research in this issue is influenced by work in nonmonotonic rea- 
soning, in particular by Nebel [13,14], Pinkas and Loui [15], and Benferhat and 
his colleagues [3,4], who developed the so-called coherence-based approaches. The 
main idea of these techniques is to start with a knowledge base and to apply two 
successive mechanisms, namely, a consolidation operation which generates and 
selects several consistent subsets of the base and an entailment relation which 
uses classical logic on the consistent subsets in order to conclude. 

As noticed by Nebel in [14], an important advantage of coherence-based ap- 
proaches is their flexibility. Different classes of consolidation operations can be 
distinguished according to the importance or relevance of declarations stored in 
the knowledge base. For example, if priorities attached to declarations are avail- 
able, then a preference ordering may be defined on the consistent subsets of the 
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base and hence, the consolidation task has a more fined control over what decla- 
rations are discarded and what declarations are going to stay. In an orthogonal 
way, different classes of entailment operations can be distinguished according to 
the cautiousness of reasoning. For example, the following kind of entailment is 
considered in [2,5] : “a knowledge base A entails the declaration a if, and only if, 
a is classically inferred by all the preferred consistent subsets of A” . A taxonomy 
of entailment operations, from credulous to skeptical ones, can be found in [15]. 

Unfortunately, one of the main drawbacks of coherence-based approaches is 
their high computational complexity. As stated in [7], the complexity of reason- 
ing in the propositional case lies at least at the second level of the polynomial 
hierarchy. This is due to the interaction of two sources of complexity, namely, 
propositional satisfiability and the selection of preferred consistent subsets. For 
this reason, one cannot expect to arrive at a polynomial algorithm when elimi- 
nating only one source, e.g., by restricting the base to Horn logic. 

Anytime reasoning is a technique which is used in many areas of artificial 
intelligence to deal with the computational intractability of problems [20] . This 
paradigm extends the traditional notion of reasoner by allowing it to return 
many possible answers to any given query. An original method, primarily due to 
Schaerf and Cadoli in [17], and recently generalized in [9], has received a great 
deal of interest in the knowledge representation community. The basic idea of 
the method is to define a family of inference relations by relaxing soundness or 
completeness of reasoning. The knowledge base can provide partial solutions even 
if stopped prematurely. The accuracy of the solution improves with the time used 
in computing the solution. Several extensions of this method have been proposed 
in the fields of modal logics [12] and first-order logic [10]. However, despite few 
exceptions (e.g. [6]), most of the studies in anytime reasoning have concentrated 
to the monotonic case. In particular, it is necessary to make formal steps in the 
direction of coherence-based reasoning. 

The purpose of this paper is to develop a logic oriented framework for anytime 
coherence-based reasoning. Our formalism is based on a multi-modal proposi- 
tional logic, presented in [9], and used to specify anytime monotonic reasoners. 
In this study, we extend our previous work in order to specify anytime non- 
monotonic reasoners. Starting from a knowledge base A and a preordering on A, 
we introduce the notion of anytime consolidation, an operation which generates 
and selects approximate preferred consistent subsets of A. Then, we define three 
classes of anytime entailment relations, which respectively incorporate the cred- 
ulous principle, the skeptical principle and the argumentative principle. Based 
on these operations, we show that an anytime view of coherence-based reasoning 
is tenable. Specifically, our framework includes the following features: 

— The logic is semantically founded on the notion of resource which reflects 
both the accuracy and the computational cost of the approximations. 

— The framework enables incremental reasoning: the quality of approximations 
is a nondecreasing function of the resources that have been spent. 

— The framework covers dual reasoning: both sound but incomplete and com- 
plete but unsound approximations are returned at any step. 




558 



F. Koriche 



The rest of the paper is organized as follows. Section 2 presents the logical 
machinery for anytime monotonic reasoners. Our main contribution lies in sec- 
tion 3 which is devoted to the formalization of anytime nonmonotonic reasoners. 
Finally, section 4 suggests some topics for future research. 

2 Anytime Monotonic Reasoning 

In this section, we focus on the formalization of anytime monotonic reasoners. 
For this purpose, we present a propositional logic, named ARL, for anytime 
reasoning. We begin to define its syntax, next we examine its semantics and 
then we present some interesting properties of the logic. 

2.1 Syntax 

Throughout this paper, we consider a nonempty and finite set of atomic propo- 
sitions (atoms for short) P. The language of declarations is the smallest set built 
from P and closed off under the connectives A, V and The connective D is 
defined in terms of -■ and V; that is, a D /3 is an abbreviation of -^a V /?. Given 
a declaration a, the set of atoms that occur in a is denoted P{a). A literal is an 
atom or its negation and a clause is a finite disjunction of literals. A knowledge 
base is a finite conjunction of clauses. When there is no risk of confusion, we 
shall model knowledge bases as sets of clauses. 

Following [17], the concept of computational resource is captured by a pa- 
rameter S, a subset of P. Intuitively, the parameter S corresponds to a limited 
and controlled exploration in the space of possibilities defined from P. 

The main contribution of the logic relies on two families of modalities 
and Os, defined for each subset S of P. The operator Ds is to capture sound 
but incomplete inference and O 5 to capture complete but unsound inference. 
The language of ARL is defined by the smallest set of sentences built from the 
following rules: if a is a declaration then a is a sentence, if a and (3 are sentences 
then ->a, a A /3 and aV P are sentences, and if a is a declaration and S' is a 
subset of P then nsa and Os a are sentences. Intuitively, a sentence such as 
□ go: is read “the agent knows a given the resources S”. Dually, Og a is read 
“the agent considers a as possible given the resources S”. 

2.2 Semantics 

In the context of limited reasoning, the four valued semantics first proposed by 
Belnap and notably studied in [11] meets our needs. The domain T of truth 
values is the powerset of {0, 1}. So, in the logic ARL, sentences can be valued 
to be true, false, both, or neither. Based on this structure, we define a valuation 
as a total function v from P to T. The space of valuations generated from P is 
denoted V. A possible world is a valuation which maps every atom p of P into 
{1} or {0}. The space of possible worlds generated from P is denoted W. 

The notion of resource is semantically represented by an equivalence relation 
between valuations. Given a parameter S, we say that two valuations v and v' 
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are S-equivalent and write f t;', iff for every atom p € P,ifp € S then v{p) = 
v'{p). Intuitively, a relation of ^-equi valence induces a partition of the set V into 
equivalence classes whose granularity captures the accuracy of approximation. 
When S increases, the partition becomes “finer” and the approximation more 
precise. The “coarsest” partition is obtained when S is the empty set; in this 
case, is the total relation over V . Conversely, the “finest” partition is given 
when S is the set P; in this case is the identity relation over V . 

An interpretation of ARL consists of a truth support relation |=i and a falsity 
support relation inductively defined by the following conditions: 



V 

V 

V 

V 

V 

V 

V 

V 

V 

V 

V 

V 



hi 


p iff 1 G v{p), 


(1) 


ho 


p iff 0 G v{p), 


hi 


->a iff V 1=0 ce, 


(2) 


ho 


-la iff V 1=1 a, 


hi 


a A /3 iff u 1=1 a and v ^i (3, 


(3) 


ho 


a A /3 iff f 1=0 a or i; ^o f3, 


hi 


a V /3 iff u 1=1 a or u 1=1 /3, 


(4) 


ho 


a V /3 iff f 1=0 a and v ^o 13, 


hi 


□ 5 a iff Vw' G V, if f v' then v' ^1 a. 


(5) 


ho 


□ s a iff u ^1 Ds a. 


hi 


Os a iff 3u' G V such that v v' and v' ^1 a. 


(6) 


ho 


Os a iff f hi ct- 



A sentence a is satisfiable iff there exists a possible world w such that w |=i a. 
We say that a is valid, and write ^ a, iff for every w G W, w \=i a holds. Given 
two sentences a and j3, we say that /3 is a logical consequence of a iff ^ a D /3 
holds. A sound and complete axiomatization for ARL can be found in [9]. 



2.3 Properties 

After an excursion into the logic ARL, we now focus on its main properties. In 
this purpose, we specify an anytime monotonic reasoner as a function that takes 
in input a knowledge base A, parameter S and a declaration a, and returns in 
output “yes” if ^ Ds (A D a), “no” is ^ Os (A D a) and “unknown” otherwise. 

Interestingly, our model can be shown incremental and dual. Specifically, 
the reasoning process may be defined by an increasing sequence of parameters 
So = 0 • • • C Sk‘ ■ ■ C S„ = P that approximate the problem of deciding 
whether a is a logical consequence of A, or not, by means of two dual families 
of tests \= (A D a) and \= Os^ (A D a). If the reasoner returns “yes” using 
any operator then a is a consequence of A. Dually, if the reasoner answers 
“no” using any operator Os,, then a is not a consequence of A. This stepwise 
process has the important advantage that the iteration may be stopped when a 
confirming answer is already obtained for a small index k. 
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Theorem 1. For any declaration a and any parameters S and S' s.t. S C S' , 



if [=nsQ; then ^ Dg/ a and hence \= a, ( 1 ) 

if ^ '^S 01 then ^S' and hence ^ a. ( 2 ) 

Lemma 1. For any declaration a, 

[=□50 iff Os -la is unsatisfiable, ( 1 ) 

^ Os a iff Ds ~^oi is satisfiable. ( 2 ) 



Theorem 2. For any declaration a and any S, there is an algorithm for decid- 
ing whether Ds o is satisfiable and Os a is satisfiable which runs in 0{\a\ ■ 2l‘®l). 
The above complexity result is just the worst case upper bound of an enu- 
meration algorithm. Actually, in the case of clausal knowledge bases, one may 
conceive a two-phases procedure which first simplifies the initial knowledge base 
and next explores the resulting search space. The simplification phase proceeds 
as follows. In the scope of the modality Os, the algorithm deletes all clauses of 
a that contain a literal whose atom occurs in S. Dually, in the scope of Ds, the 
algorithm eliminates in any clause of a all literals whose atom occurs in S. Since 
any atom in the resulting theory occurs in S, the exploration phase consists 
in a standard (two-valued) satisfiability algorithm. Systematic methods such as 
depth first search enumeration [19] can be used to compute at the same time 
the satisfiability of Ds a and the unsatisfiability of <>s a. On the other hand, 
local search algorithms [18] can be exploited if we concentrate on the satisfia- 
bility of Qs a. In a nutshell, the role of the simplification phase is to reduce the 
dimensions of the formula, thus gaining efficiency in the exploration phase. 

The correct choice of S is crucial for the usefulness of deduction. Taking to 
the extreme, when S is chosen incorrectly, anytime reasoning may end up as 
expensive as classical reasoning. From this perspective, several heuristics have 
been proposed in the literature. For example, the atoms of S may be dynamically 
chosen using the diversity heuristic advocated in [ 8 ]. The diversity of an atom p 
is the product of the number of positive occurrences by the number of negative 
occurrences of p in the theory. This notion is based on the observation that an 
atom is a potential source of unsatisfiability only when it appears both positively 
and negatively in different clauses. Thus, in the scope of the modality Os, the 
strategy consists in choosing atoms whose diversity is maximal. Dually, in the 
scope of Dg, the algorithm iteratively selects atoms whose diversity is minimal. 

Example 1. Let A = {(-laV&Vc), (aV 6 V-'(i), (aV-' 6 Vc?), (-iaV-' 6 Vc)}. We want 
to show that A is satisfiable. We need to find a subset S of {a, b, c, d} s.t. Dg A 
is satisfiable. Starting with S = 0 and using the minimal diversity heuristic, we 
gradually add c and a to S. This is sufficient for proving that A is satisfiable. 

Example 2. Suppose we want to show that a D c is a logical consequence of 
the knowledge base A, defined above. We need to find a subset S such that the 
sentence Og (A A a A -ic) is unsatisfiable. Using the maximal diversity strategy, 
we iteratively add a, b and c to S. This is sufficient for proving that (A A a A -ic) 
is unsatisfiable. So, a D c is indeed a logical consequence of A. 
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3 Anytime Nonmonotonic Reasoning 

In this section, we extend the concepts developed so far to the formalization of 
anytime nonmonotonic reasoners. In the setting suggested by our approach, these 
systems are defined in terms of anytime consolidation and anytime entailment. 
The quality of result of each operation depends on the computational resources 
that have been spent. We begin to define the concept of anytime consolidation, 
next we present three classes of anytime entailment, and then we examine the 
computational properties of our framework. 

3.1 Anytime Consolidation Operations 

As considered for instance in [13,14], a “standard” consolidation operation starts 
from a knowledge base and a priority ordering on this base and selects the 
preferred consistent subsets of the base. The purpose of “anytime” consolidation 
is to control the generation of these subsets by the notion of resource parameter. 

To this end, we need some additional definitions. A prioritized knowledge base 
is a pair (A, <) where A is a knowledge base and < is a total preorder on A. It is 
equivalent to consider that A is stratified in a collection (Ai, • • • , A„), where Ai 
contains the declarations of highest priority and A„ those of lowest priority. Each 
knowledge base Ai is called a stratum of A. The structure (A, <) is called flat 
if the relation < is symmetric, or equivalently, if A contains an unique stratum. 
Different methods have been proposed to use the priority relation in order to 
select “preferred” consistent subsets (see e.g. [3]). In this study, we focus on the 
inclusion-based preference ordering, denoted whose strict part is defined as 
follows: B A C iff 3z : B n Ai C C n Aj and Vj : 1 < j < i, i? fl A^ = C fl Aj. 
By extension, Bf,CiAB<C or B = C. Based on these considerations, the 
standard consolidation operation, denoted A, is defined as follows: 

A(A, <) = max{{B C A : B is satisfiable}, ^). 

Now we incorporate the notion of computational resource. A parameter S is 
said acceptable for a prioritized knowledge base (A, <) iff the following condition 
holds: if : S' n P{Ai) 0 then Vj : 1 < j < i, P{Aj) C S. Intuitively, 
the acceptability condition imposes a restriction on the choice of computational 
resources: if an acceptable parameter contains at least one atom of any given 
stratum then it must contain all atoms of strata of higher priority. In particular, 
it is interesting to remark that if the structure (A, <) is flat, then every subset 
of P is acceptable for (A, <). The anytime view of consolidation is realized by 
parameterizing the operation A by means of two families of operations □ and 
A, the first one being sound, while the second one being complete with respect 
to standard consolidation. The corresponding anytime consolidation operations 
are defined as follows: 

□ (A, <, S) = max{{B C A : Ds i? is satisfiable }, A), 

A(A, <, S) = max{{B C A : Os B is satisfiable }, A). 

The following lemmas capture important properties of anytime consolidation. 
They will be frequently used in the remaining paper. 
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Lemma 2. For any prioritized knowledge base {A, <) and any acceptable pa- 
rameters S and S' such that SCS': 

\/BGa{A,<,S) 3CG □(A,<,S") suchthat sec, (1) 

VB G 0(A,<,S") 3Cg such that sec. (2) 

Proof. Let us examine part (1). Assume that there exists a knowledge base 
B G n(A, such that for every base C G n(A, <,S"), we have B % C. 
We show that this leads to a contradiction. If S G n(A, then Og B is 
satisfiable. By application of theorem 1 and lemma 1, it follows that Dg' B is 
satisfiable. Since B ^ n(A, <,S") there must exist a base C G LI(A, <,S") such 
that B ^ C. Therefore, 3t : BdAi C Cfl and Vj : 1 < j < i,BC\Aj = CHAj. 
By assumption, we know that B C. So, 3fc > i : S fl ^ C fl A^. Thus, it 
follows that SnAfc ^ 0. Since Ds B is satisfiable, we must have S'nP(Afc) ^ 0. 
By acceptability condition, it follows that Vfc' < k, P{Ak') C S. Let B' denotes 
the set lJ{Cn Afe/ : k' < k}. Obviously, Og B' is satisfiable. Moreover, it is clear 

that B < B' . Therefore, we obtain B ^ n(A, <,S'), but this contradicts the 

initial hypothesis. A dual argument applies to part (2). 



Lemma 3. For any knowledge base A, any clause a and any parameters S and 
S' such that S C S' , 

1. if Dg/ A is satisfiable and Dg/ A U {a} is unsatisfiable, then there exists a 
subset B of A such that Dg B is satisfiable and Dg B U {a} is unsatisfiable. 

2- if Os A is satisfiable and Og A U {a} is unsatisfiable, then there exists a 
subset B of A such that Og/ B is satisfiable and Og> BU{a} is unsatisfiable. 

Proof. Let us examine part (1), If Dg a is unsatisfiable then B = 0 and we have 
demonstrated the property. Now, suppose that Oga is satisfiable. Thus, there 
exists a literal I in a such that its atom belongs to S. Moreover, since Dg' A is 
satisfiable and Dg/ Au{a} is unsatisfiable, there exists a clause /3 in A such that 
the negation of I belongs to /3. So, Og (3 is satisfiable. Let 7 denotes the resolvent 
of a and (3. If Dgy is unsatisfiable, then B = {/?}. Otherwise, there exists a 
literal I' in 7 such that its atom belongs to S. Thus, there exists a clause /?' in A 
that contains the negation of I' . Since 7 does not contain any occurrence of I, it 
is clear that P{1') O P{1) = 0 . Therefore, Dg /? A /3' is satisfiable. Let 7 ' denotes 
the resolvent of 7 and (3'. If ns 7 ' is unsatisfiable then B = {(3,(3'}. Otherwise, 
we iteratively apply the same method until we obtain all the clauses of A. In 
this case, Dg A is satisfiable. An analogous strategy applies to part (2). 



Lemma 4. For any prioritized knowledge base (A, <) and any acceptable pa- 
rameters S and S' such that S C S' : 



MB G n(A, <, S') 3C G n(A, <, S) such that COB, 
MBgO{A,<,S) 3C G 0(A, <, S") such that C C _B. 



( 1 ) 

(2) 
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Proof. Let us examine part (1). Suppose we have B G □(A, <,S"). li B = A 
the demonstration is straightforward. Now, suppose that B C A. We know 
that Dg' B is satisfiable. Moreover, for every clause /3 in A/B, ^ 5 / B U {/?} is 
unsatisfiable. Let a denotes the clause \/{P : /3 G A/B}. Obviously, ^ 5 / BU{a} 
is unsatisfiable. By application of lemma 3, there exists a subset B' of B such 
that Ds B is satisfiable and Dg B U {a} is unsatisfiable. Clearly enough, B' can 
be extended to a set C such that C G 0(5, <, S). Suppose that C ^ □(^, <, S). 
Then, there must exist a set C" G □(A, <,S') such that C -< C . Therefore, 
3i ■. C A Ai G C A Ai and Vj : 1 < j < i, C fi Aj = C" fi Aj. Clearly, C C . 
Suppose not. In this case, 3k > i, such that C A Ak C A Ak. Thus, it follows 
that C A Ak yf 0. Since Dg C is satisfiable, P{Ak) A S ^ 0. Therefore, V/c' < 
k, P{Ak>) C S. It follows that Vfc' < k, C A Ak' = B A Ak'. Thus, we obtain 
B A Ai G C A Ai. So, B -< C . Since Dg' C" is satisfiable, B ^ □(A, <,S"), 
but this contradicts the initial hypothesis. So, we can state that C G C . Thus, 
3(3 G A/B such that C U {(3} G C' . However, Dg C U {(3} is unsatisfiable. 
Therefore C ^ □(H, <, S). So, C G □(H, <, S). Moreover, since C G A[B, <, S), 
we obtain C G B, as desired. A dual argument applies to part (2). 

3.2 Anytime Entailment Operations 

In the setting of coherence based-reasoning, a “standard” entailment relation 
takes in input a collection of preferred consistent subsets and returns in output 
a set of cautious conclusions. A taxonomy of numerous entailment principles 
has been established in [15] according to their cautiousness. In this study, we 
are interested in three of them: the existential principle, the universal principle 
and the argumentative principle. We begin to present these different classes of 
entailment relations and next we examine their corresponding approximations. 

The first two entailment principles, introduced by Rescher and Manor in [16], 
are the most commonly used in presence of contradictory knowledge bases (see 
e.g. [2,3]). They can be respectively described in the following way: 

(A, <) Ih^ a iff 3B G A(A, <) such that \= B G> a, 

{A, <) a iff VB G A(A, <), ]= R D a. 

Obviously, universal entailment is more cautious than existential entailment, 
since each conclusion obtained from (A, <) using is also obtained by Ih^. In 
fact, universal entailment is often too conservative and hence rather unproduc- 
tive while existential entailment is often too permissive and may lead to pairs 
of mutually exclusive conclusions. The notion of argumentative entailment, sug- 
gested for instance in [4,15], is based on an intermediate principle which is more 
productive than universal entailment but does not lead to contradictory conclu- 
sions. It consists in keeping only the consequences obtained by the existential 
principle whose negation cannot be inferred. In formal terms: 

(A, <) Ih-^ a iff (A, <) Ih^ a and (A, <) -na. 

In the remaining paper, the symbol x will be used to refer to one of the 
entailment principles denoted by the symbols 3, V and A. 
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We now turn to the anytime view of entailment relations. The idea is to 
approximate a standard nonmonotonic relation, say Ih®, by means of two dual 
families of relations I by and Ih^, the first one being sound, while the second one 
being complete with respect to Ih®. The notions of anytime existential entailment 
and anytime universal entailment are defined as follows: 

{A, <, S) ll-Q a iff 3B G n(^, <, S) such that ^ Dg (B 3 a), 

{A, <, S) ll-| a iff 3B G 0(A, <, S') such that \= Os {B D a), 

{A, <, S) Ih^ a iff VS G a{A, <, S), h Os (S D a), 

{A, <, S) Ih^ a iff VS G 0{A,<,S), \= Os{Bd a). 

The notion of anytime argumentative entailment is defined as follows: 

{A, <, S) ll-Q a iff {A^ <) ll-Q a and (A, <) \'f% -la, 

(A, <, S) Ih^ a iff (^, <) ll-|, a and (A, <) ->a. 

We are now in position to provide a specification tool for anytime non- 
monotonic reasoning. From this perspective, we define an anytime nonmono- 
tonic reasoner as a function that takes in input a prioritized knowledge base 
(^, <), an acceptable parameter S, a declaration a (i.e. the query) and an en- 
tailment principle x, and returns in output “yes” if (Vl, <,S) Ih^ a, “no” if 
(A,<,S) \y-^ a, and “unknown” otherwise. As for monotonic deduction, the 
nonmonotonic reasoning process can be modeled by an increasing sequence of 
parameters (Sq = 0 • • • C S^ • • • C S„ = P) that approximate the problem of 
deciding whether (A, <) \\-^ a holds, or not, by means of two dual families of 

entailment tests (A, <,5'^) Ih^ a and (A, <,5'^) a. If the reasoner returns 

“yes” for a given index k, then (A, <) \\-^ a holds. On the other hand, if the 
reasoner answers “no” for a given k, then (A, <) a does not hold. These 
considerations are clarified by the following properties. 

Theorem 3. For any prioritized knowledge base (A, <), any declaration a and 
any acceptable parameters S and S' such that S C S' , 

if (A, <,5”) ll-Q a then (A, <,5") II-q a and hence (A, <) Ih^ a, (1) 

if (A, <,5”) a then (A, <,S") l)b| a and hence (A, <) a. (2) 

Proof. Let us examine part (1). We begin to focus on the first implication. Sup- 
pose that (A, <, S) ll-Q a holds. Then, 3B G 0(A, <, S) such that |= Dg (i? D a) 
holds. By lemma 2, we know that 3C G n(A, <,S") such that B Q C. By the 
monotonicity property of conjunction, it follows that ^ LIg (C D a). By ap- 
plication of theorem 1, it follows that |= Dg/ (C D a). Therefore, we obtain 
(A, <, S') ll-Q a, as desired. Now we turn to the second implication of part (1). 
As before, we assume that (A, <,£') II-q a holds. Since S' C P, it follows that 
(A, 0,P) a. By using the semantical properties of ~p, we can easily verify 
that n(A, <,P) = A(A, <), and that ^ Dp (A D a) holds iff ^ A D a holds. 
So, (A,<,P) ll-Q a is logically equivalent to (A, <) Ih^ a. Therefore, it follows 
that (A, <) Ih^ a holds, as desired. A dual strategy holds for part (2). 
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Theorem 4. For any prioritized clausal knowledge base {A, <), any declaration 
a and any acceptable parameters S and S' such that S C S' , 

if II“d a then (A, <,5") Ih^ a and hence (^, <) a, (1) 

if 1)^0 ^ {A,<,S'') \y-\ a and hence (^, <) a. (2) 

Proof. We only examine the first implication of part (1). Suppose we are given 
(A, <,S') ll-Q a and (A, <,S") \y-fj a. From the second assertion, 3B G n(^, < 
,S") such that ^ Dg/ {B D a). By contraposition of theorem 1, it follows that 
^ Dg (_B D a). Moreover, since B G n(A, <,S"), by application of lemma 4, 
3C G □(gI, <, S) such that C C S. By the monotonicity property of conjunction, 
it follows that ^ Dg (C D a). Therefore {A, <, S) I)^q a, hence contradiction. 

Theorem 5. For any prioritized knowledge base {A, <), any declaration a and 
any acceptable parameters S and S' such that S C S' , 

if (A,<,S) ll-Q a then (A, <,S") Ih^ a and hence (^, <) ll—^ a, (1) 

if (A,<,S) \y-^ a then (A, <,S") 1)^^ a and hence (^, <) a. (2) 

Proof. We only examine the first implication of part (1). Suppose that {A,< 
,S) Ih^ Of. Then, (A, <,5) II-q a and (A, <,S') \y-% -la. From the first assertion 
and by theorem 3(1), it follows that {A, <, S') II-q a. From the second assertion 
and by theorem 3(2), it follows that {A, <, S') l)^|, ~^a. Thus, {A, <, S') Ih^ a. 

3.3 Computational Properties 

We now turn to computational considerations. To this very point, we recall 
that coherence-based reasoning is characterized by two interacting sources of 
complexity, namely, propositional satisfiability and the selection of preferred 
consistent subsets. The following theorem states that both sources of complexity 
are bounded by the same resource parameter S. 

Theorem 6. For any prioritized knowledge base {A, <), any declaration a and 
any parameter S, there exists an algorithm for deciding whether {A, <, S) Ih^ a 
holds and {A, <, S) a holds which runs in 0((|A| -|- |a|) • 2l'^l • 2l‘®l). 

Proof. We focus on the complexity analysis of {A, <, S) II-q a. The demonstra- 
tion is analogous for the other entailment relations. We begin to prove that the 
size of n(^, <, S) is bounded by 2l‘^L Let B and B' be two sets of D(T, <, S). 
Obviously, Og (B U B') is unsatisfiable. Let denotes the set of valuations v 
such that Vp G P, v{p) = {0} or v{p) = {1} if p G S', and v{p) = {} otherwise. 
Moreover, given a declaration (], let Fg (/3) denotes the set of valuations v in F^ 
such that V j3. Clearly, Dg {B U B') is unsatisfiable iff Vg{B) 0 Vg{B') = 0. 
Since there exists 2 1'® I valuations in Fj^, the maximum number of bases be- 
ing locally satisfiable and pairwise unsatisfiable under the scope of Dg is 2l'^l. 
Now, let us examine the main result. Suppose that if (A, <, S) II-q a holds then 
3B G □(Gl, <, S) such that ^ Dg (B D a). By application of lemma 1 and theo- 
rem 2, the validity test of Dg {B D a) is in 0((|A| -|- |a|) • 2l'®l). Since there are 
at most 2l‘®l bases B, the entailment test is in 0((|^| + |a|) • 2l‘®l • 2l'^l). 
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Several algorithms can be used for anytime nonmonotonic reasoning. The key 
difficulty lies in the consolidation operation. To this end, one may conceive an 
algorithm which takes in input a prioritized clausal base {A, <) and computes 
□ <,S'fc) by means of an increasing sequence Sk- For k = 0 the procedure 

simply returns the empty base. For k > 0, the procedure proceeds into two 
steps. First, for each subset B of □(^, <, S'fc_i), the procedure computes the 
satisfiable expansions of B that take clauses containing the literal pk or its 
negation ~<pk- Second, the procedure selects the maximal expansions and add 
them to □(A, <,S'fc). As far as ^{A,<, Sk) is concerned, dual considerations 
hold. Such an algorithm is indeed incremental; by exploiting lemmas 2 and 4, 
the procedure only needs to expand the maximal subsets generated in previous 
steps and does not require to perform all computations from scratch. 

The correct choice of S is crucial for the usefulness of anytime consolidation. 
This choice may be guided by the priority ordering <. Following the acceptability 
condition, the parameter is constructed by selecting the atoms from the stratum 
of highest priority, then the atoms of the next important stratum are added, and 
so on. Alternatively, inside each stratum, the choice of S may be heuristic. In 
this case, the letters are iteratively selected to minimize the predicted number 
of consistent subsets, using a strategy such as the minimal diversity heuristic. 

Example 3. Consider the flat base A = {a, b, c, -ic, -^aV-'b, -•aVc, -■aV-'C, -•bVd}. 
We want to show that A Ih^ d. Hence, we need to find a set S such that A II-q d. 
Starting with S = 0 and using the minimal diversity heuristic, we iteratively 
add d and b to S. Based on the following results, we observe that A II-q d. 



s 


□ (A,S) 


0 


0 


{d} 


{{-^bVd}} 


{b,d} 


{{6, -ife V d}, {-ia V -ife, -ifo V d}} 



Example 4- Suppose we are given the prioritized base A = (Ai, A2) where Ai = 
{a, -lO, e} and A2 = {c, -'d, -^a V bj-^cV d}. We want to show that (A, <) Ih- ^ b. 
So, we need to find a set S such that (A, <) Ih^ b. Starting with S = 0 and using 
the acceptability condition, we first add the atoms a and e and next we select b. 
Based on the following results, we indeed obtain (A, <) II-q b and (A, <) -•b. 



S' 


□ (A,<,S) 


0(A,<,S) 


0 


0 


A 


{a,e} 


{{a,e}, {-.a,e,-.aV b}} 


{{a, e} U A 2 , {-^a, e} U A 2 } 


{a,b,e} 


{{a, e, -la V b}, {-■a, e, -<a V b}} 


{{a, e} U A 2 , {-ia, e} U A 2 } 



4 Conclusion 

In this paper, we have studied the problem of reasoning from inconsistency fo- 
cusing on the so-called coherence-based approaches. One of the main drawbacks 
of these methods is their high computational complexity. Our aim was to pro- 
vide a logical framework which tackles this difficulty through the paradigm of 
anytime computation. We have illustrated that the framework integrates several 
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major features: resource-bounded reasoning, incrementality and dual reasoning. 
Some of the future directions of this work include the empirical study of anytime 
coherence-based reasoning. To this point, some benchmarks for coherence-based 
reasoning have recently been proposed in [7]. An important issue is to compare 
the performances of the standard methods with our anytime technique. 
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Abstract. This paper provides a logical analysis of conflicts between 
informational, motivational and deliberative attitudes such as beliefs, 
obligations, intentions, and desires. The contributions are twofold. First, 
conflict resolutions are classified based on agent types, and formalized in 
an extension of Reiter’s normal default logic. Second, several desiderata 
for conflict resolutions are introduced, discussed and tested on the logic. 
The results suggest that Reiter’s default logic is too strong, in the sense 
that a weaker notion of extension is needed to satisfy the desiderata. 



1 Introduction 

Various competing agent decision models have been proposed, and it is still 
unclear which type of model should be used in which type of application. For 
example, some decision models are based on goal-based planning or on variants of 
decision theory like qualitative decision theory [13,1], other models are based on 
cognitive models like belief-desire-intention models [5,14], and yet other models 
are based on social concepts like obligations and norms [6,20,19], as in deontic 
action programs [8] . Typically, the decision model is based on an attempt to reach 
goals, satisfy desires, or fulfill obligations. In the Belief-Obligation-Intention- 
Desire or BOID architecture [4] decision models are considered in which the 
main problem is not finding out how to reach goals, satisfy desires or fulfill 
obligations, but in which the main problem is to resolve conflicts between them. 

The BOID logic discussed in this paper is an abstraction of the BOID ar- 
chitecture. For conflicts so-called extensions are constructed and one extension 
is selected, an idea adopted from Thomason’s BDP logic [18], which is in turn 
based on Reiter’s default logic [15]. In particular, BDP logic is based on conflict 
resolution for conditional beliefs and desires, which is extended in the BOID logic 
with conditional obligations and intentions borrowed from respectively deontic 
action programs [8] and BDI logic [5,14]. The BOID logic is an abstraction from 
the BOID architecture, in the sense that in the latter the components may not 
contain rules or be based on propositional logic, and in case of limited resources 
the extensions may not be fixpoints. 
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The contributions of this paper are twofold: 

1. We give a classification of conflict resolutions between conditional beliefs, 
obligations, intentions, and desires. Extending the BDP logic with obliga- 
tions and intentions increases the number of possible conflicts dramatically. 
In all realistic conflict resolutions beliefs override obligations, intentions, and 
desires; in stable conflict resolutions intentions override desires and obliga- 
tions; in unstable conflict resolutions desires and obligations override inten- 
tions; in selfish conflict resolutions desires override obligations; and in social 
conflict resolutions obligations override desires. 

2. We propose several desired and undesired properties to analyze this overrid- 
ing encoded in the BOID logic. As our running example we show how beliefs 
override desires to block wishful thinking. For example, assume that you 
believe that you get wet irrespective of your desire to stay dry. This would, 
according to Thomason, imply that the belief to get wet overrides the desire 
to stay dry, in the sense that in your planning you will assume that you will 
get wet. 

The layout of this paper is as follows. In Section 2 different types of conflicts 
are introduced and a classification of conflict resolution types is discussed. In 
Section 3 the BOID logic and its extension calculation scheme are introduced. 
In Section 4 properties for wishful thinking are analyzed, and in Section 5 we 
discuss the properties of extensions provided by the BOID calculation scheme. 

2 Beliefs, Obligations, Intentions, and Desires 

Reasoning about beliefs, obligations, intentions and desires has been discussed 
in practical reasoning in philosophy [21,2], and its formalization to build intel- 
ligent autonomous agents has more recently been discussed in qualitative de- 
cision making in artificial intelligence [7,8,14,18]. On closer inspection each of 
these four concepts consists of related (though often quite distinct) concepts, 
for example respectively knowledge and defaults, prohibitions and permissions, 
commitments and plans, wishes and wants. All these concepts are grouped into 
these four classes due to their role in the decision making process: beliefs are in- 
formational states - how the world is expected to be - obligations and desires are 
the external and internal motivational states, and intentions are the deliberative 
states. 



2.1 Conflict Resolutions 

A conflict resolution type is an order of overruling. Given four attitudes, there are 
twenty-four possible total orders of overruling, and many more partial orders in 
which for example desires and obligations are equivalent. In this paper, we only 
consider those orders according to which beliefs overrule any other attitude. This 
reduces the number of possible total overruling orders to six. Some examples of 
conflict resolution are given below. 
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— A conflict between a belief and a prior intention means that an intended 
action can no longer be executed due to the changing environment. Beliefs 
therefore overrule the prior intention, which is retracted. Any derived con- 
sequences of this prior intention are retracted too. Of course, one may allow 
prior intentions to overrule beliefs, but this results in unrealistic behavior. 

— A conflict between a belief and an obligation or desire means that a violation 
has occurred. As observed by Thomason [18], the beliefs must override the 
desires or otherwise there is wishful thinking; the same argument applies to 
obligations. 

— A conflict between a prior intention and an obligation or desire means that 
you now should or want to do something else than you intended before. Here 
prior intentions override the latter because it is exactly this property for 
which intentions have been introduced: to bring stability. However, in cases 
of intention reconsideration such conflicts may be resolved otherwise. For 
example, if I intend to go to the cinema but I am obliged to visit my mother, 
then I go to the cinema unless I reconsider my intentions. 

2.2 Detecting versus Resolving Conflicts 

Further specifying and implementing the conflict types leads to several compli- 
cations. It may seem that we can use one of the many approaches to conflict 
resolution developed in other areas of artificial intelligence like for example di- 
agnosis [16], default reasoning or fusion of knowledge and databases. However, 
in these approaches a conflict is defined as a minimal set, in the sense that if two 
sets are conflict sets then one of the sets cannot be a strict subset of the other 
one. Whereas minimal sets may be useful to detect conflicts, it is not sufficient 
to resolve them. 

An example has been given by Dignum et. al. [7], who discuss an extension of 
the BDI logic with obligations. In this example, there is a guy called Al who has 
an obligation to perform a task for Bob and another incompatible obligation to 
perform a task for Chris. Moreover, Al has the norm that he should tell Bob if 
he does not intend to meet this obligation. The problem discussed in the paper 
is that the existence of the norm should affect Al’s decision on whether to intend 
to fulfill his obligation: 

“Consider Al’s obligation above, until he actually commits to not meet- 
ing his obligation to Bob, the need to tell Bob does not exist, yet the 
potential for it may have a significant impact on his decision on whether 
to do the task for Bob. For example, imagine that the task is trivial (i.e., 
the direct consequences of not doing the task are small), but the social 
consequences of not informing Bob are very high (i.e., Al is perceived as 
unreliable).” [7, p.ll5] 

The point is thus that to resolve the conflict we cannot restrict ourselves to the 
minimal set (the two obligations), but we have to consider the whole set. In 
general, agents should consider the effects of actions before committing to it. 
This is the reason why in the BOID logic complete extensions are constructed 
before one is selected, instead of solving a conflict as one is encountered. 




Resolving Conflicts between Beliefs, Obligations, Intentions, and Desires 571 



3 BOID Logic 

In this section we discuss the BOID logic. First, we consider Reiter’s normal 
default logic and Thomason’s BD logic. 

3.1 Reiter’s Normal Default Logic 

Reiter defined extensions of normal default theories as follows, where we write 
a ^ w for {a : Mw/w) and we write (IF, D) instead of {D, IF). 

Definition 1. [15, Def. 1] Let A = (IF, D) he a closed default theory, so that 
every default of D has the form a ^ w where a and w are both closed wffs of a 
(first-order) language L, and let Th^^S) be the consequence set of S in L. For 
any set of closed wffs S C L let T{S) be the smallest set of closed formulas from 
L satisfying the following three properties: 

1. IFCT(S') 

2. ThL{T{S)) = T{S) 

3. Ifa^wGD,aG T{S) and -•w ^ S, then w G T{S). 

A set of closed wffs E C L is an extension for A iff T{E) = E, i.e. iff E is a 
fixed point of the operator T. 

A well-known theorem of Reiter’s paper is the following more intuitive char- 
acterization of extensions. 

Theorem 1. [15, Th. 2.1.[ Let E C L he a set of closed wffs, and let A = (IF, D) 
be a closed default theory. Define 
Eo = W 
and for i > 0 

Ei+i = ThriEi) U{w\a^wGD where a G Ei and ~iw ^ E} 

Then E is an extension for A iff 
E = AfT^Ei. 

3.2 Thomason’s BD Logic 

Thomason [18] proposes a so-called BDP-logic for beliefs, desires and planning 
which is capable of modeling a wide range of common-sense practical arguments, 
and which can serve as a more general and flexible model for the decision making 
process. Thomason first discusses the BD formalism and focuses on the interac- 
tion between beliefs and desires. The basic idea is to model beliefs and desires 
both as Reiter defaults [15], without modalities for belief or desire, such that 
the extensions contain all the derived atoms. That is, a BD-basis is a tuple 
{Obs, NB, ND) with Obs a set of formulas, NB a set of B-defaults ‘if a then 

I believe x’ written as a ^ x, and ND a set of D-defaults ‘if a then I desire 

x’ written as a ^ x. Extensions are built iteratively by applying default rules 
without distinguishing between beliefs and desires, so for example, the BD-basis 

({a}, {a A b},{b c}) has as an extension ThL{{a,h,c\). But then, there are 

two types of conflicts: 
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— Conflicts between a belief and a desire lead to overriding of desire by belief 
to block wishful thinking. 

— Other conflicts, for instance, one between two desires or between two beliefs 
lead to multiple extensions. 

Central in Thomason’s iterative calculation of extensions is that belief and desire 
defaults are treated equally, except for the situations where a desire default 
conflicts with a subset of the belief defaults applied to the formulas derived in 
the sequence so far. In such a conflicting situation, the belief defaults are applied 
preferably. 

3.3 BOID Logic 

The BOID logic extends Thomason’s idea with obligations and intentions (like 
[7]) resulting in the BOID logic. This logic consists of four sets of propositional 
logical formulae that represent the four attitudes Beliefs, Obligations, Intentions, 
and Desires. One reason for this extension is to incorporate elements of the social 
level, i.e. social commitments, to formalize for example social agents and social 
rationality. The BOID logic is parameterized in order to resolve conflicts between 
attitudes according to a complete conflict resolution type. This input parameter 
constrains the order in which derivation steps for different sets are undertaken 
and characterizes the type of conflict resolution. 

The iterative procedure of the BOID calculation scheme is given as an exten- 
sion of Reiter’s more intuitive characterization of extensions in Theorem 1. As 
in [12] we assume that there is an order on the rules, which we represent by p. 
In order to define this calculation scheme, we first define an ordering function p 
that represents the conflict resolution type. In case of multiple applicable rules, 
one with the lowest p value is applied. 

Definition 2. Let L he a propositional language and S be a set of ordered pairs 
of L written as a ^ w and called rules. An agent type is a set of functions p 
from S to the integers. 

The agent type is usually expressed as a constraint. For example, if S is the 
union of beliefs B and desires D, then the agent type ‘realistic’ is expressed by 
the constraint that for all rb £ B and rd £ D we have p{rb) < p{rd)- Given a 
specific agent type, the calculation scheme for building extensions is defined as 
follows. 

Definition 3 (BOID Calculation Scheme). Let L be a propositional lan- 
guage, let a tuple A = {W, B, O, I, D) he a BOID theory with W a subset of L 
and B, O, I and D sets of ordered pairs of L written as a ^ w, let p he a 
function that assigns to each rule in B\JO\Jl\JD a unique integer, and S a 
subset of L. Moreover, let 

Pmm{BOID, S) = min{p(o; ^ w) \ a^w£BLlOLIlLlD, a £ S, -^w ^ S} 
mm{BOID, S) = w s.t. a ^ w £ BLlOOlLlD, p{a ^ w) = pj^iniBOID, S) 
Define 




Resolving Conflicts between Beliefs, Obligations, Intentions, and Desires 573 



Eo = W 
and for i > 0 

Ei+i = ThriEi U {min(i30/Zl, S')}) if such a minimal element exists, 

Ei+i = Ei otherwise. 

Then E Q L is an extension for A of agent type A iff 3p & A s.t. E = yj'^^Ei. 



3.4 Discussion 

Space does not permit us to compare the BOID logic in any detail with classical 
approaches to specification and verification of agent systems, based on for exam- 
ple modal and temporal logics like BDICTL [14,17]. We just make the following 
remarks: 

— The analysis of conflicts in BDICTL is limited, in the sense that for example 
two conflicting desires cannot be represented in a consistent way. 

— The representation of conditionals in BDICTL is not straightforward, 
whereas this is a central issue in BOID logics. 

— To compare BDICTL and BOID logic the propositional base language of 
BOID logic must be replaced by BDICTL.^ 

— Each state in the BOID logic has the same logic, i.e. normal default logic, 
but it can be further developed such that for example for obligations and 
desires we do not have that inputs are included in the extensions, see [10, 
11 ]. 

A second and more interesting issue is the comparison of BOID logic with 
extensions of default logic such as preferred answer sets [3]. One of the results 
obtained here is that a greedy approach as used in the BOID logic (always try 
to apply the rule with the highest priority) may lead to globally suboptimal 
results (e.g. by first applying a rule of priority 3 instead of one of priority 2 we 
can thereafter apply a rule of priority 1 - by convention the highest priority). 
The greedy approach is justified by the fact that the BOID logic is only an 
idealization. In reality fixpoints may never be reached due to limited resources. 



4 No Wishful Thinking 

Thomason [18] argues that beliefs override desires with the following example. 
If you think it is going to rain and you believe that if it rains, you will get wet, 
and you would not like to get wet, then you have to conclude that you get wet. 
Beliefs therefore prevail in conflicts with desires. 

^ This extension is not as interesting as it may seem at first sight, because the exten- 
sions are used in the agent’s planning and to plan to achieve goal p it is irrelevant 
whether there is an intention, desire or obligation to see to it that p. Note that it is 
important in the implementation [4]. There have also been convincing philosophical 
arguments to do without modal operators, see [9]. Advantages of this extension are 
the formalization of more complex notions like permissions and ignorance. 
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How can we formulate this intuition as a property of extensions? In this 
section we consider three properties that guarantee that beliefs override desires. 
These properties are not restricted to one particular approach, but can be applied 
to any extension-based approach. To facilitate the definitions of the properties 
in this section we use the following definition. 

Definition 4. Let A = (IT, B, D) he a BD theory, where W is a set of proposi- 
tional sentences and B and D ordered pairs of such sentences. We write Ebd{A) 
for the set of all extensions of a propositional BD theory, and for representational 
convenience we write Ebd{W,B,D) for E bd{{W, B , D)) . 

4.1 Applied Desire Rules 

The intuition behind Property 1 of no wishful thinking below is as follows. If 
in a conflict between a desire and a belief the desire rule is removed, then the 
extension cannot increase because the belief rule already had priority over the 
desire rule. In other words, the removal of desires can only decrease the extension, 
not increase it or remove it. 

Property 1 (Applied D rules; first attempt). For each E' G Ebd{W,B,D') and 
D C D' there is an if G Ebd{W, B, D) such that E C E' . 

The following example illustrates that Property 1 is, unfortunately, too 
strong. 

Example 1. Let Ai = (0,0, {T ^ p}) and Z \2 = (0, 0, {T ^ p,T ~'P})- 
Intuitively we have Ebd{Ai) = {Thrlp)} and Ebd{A 2 ) = {Thrip) ,ThLir^p)} . 
But for E' = Thri-'p) G Ebd{A 2 ), there is no if G Ebd{Ai) such that E C E'. 
This example contradicts Property 1. 

Example 1 also illustrates where our first attempt goes wrong. The prob- 
lem is that D may contain rules which have not been used to build if' of 
Ebd{W, B, D'), but they may be used when building E of Ebd(W, B, D). In 

the example, this rule was T p. We first introduce a definition to identify 
an extension with the set of rules which are applied in it (sometimes called its 
generators). 

Definition 5 (Applied rules). Let A = (W,B,D) he a BD theory and let 
the set E he one of its extensions. The set of applied rules in extension E is 

Rb{A,E) = {a w G B \ aAw G if}, Rd{A,E) = {a ^ w G D \ aAw G E}, 
and R{A,E) = Rb{A,E) \J Rb{A, E) . 

The following Property 2 is a weaker form of the Property 1, because we have 
Rd{{W,B,D'),E') CD'. 

Property 2 (Applied D rules, second attempt). For each E' G Ebd(W,B,D') 
and D C Rb{{W, B, D'), E') there is an if G Ebd{W, B, D) such that E C if'. 
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The following example reconsiders Example 1 and illustrates that Property 2 
does not have the undesirable behavior. 

Example 2. Ai = (0,0, {T p}), A 2 = (0, 0, {T ^ p,T ^ ~'P})- 
As mentioned in Example 1, Ebd{Ai) = {T/ii(p)} and Ebd{A 2 ) = 

{ThL{p), ThB{~'p)'\ contradict Property 1. However, it does not contradict Prop- 
erty 2, because for E' = ThL{~^p) we have i?i)((0,0,{T p, T -ip}),E') = 
{T ^ -ip}, and this set is not a superset of the desire rules in Z\i. 

The following simple examples further illustrate Property 2. 

Example 3. Z\i = (0, {T p,q -ip}, 0), Z \2 = (0, {T p,q-^ -ip}, {T q}). 
If Ebd{Ai) = {T/ii(p)}, then each element of Ebd{A 2 ) has to contain ThL{p), 
and Ebd{A 2 ) thus cannot contain for example Th^iq A ~<p). 

Example 4. Ai = (0,{T A p},0), A 2 = (0, {T A p},{T q,p “'9})- If 
Ebd{Ai) = {T/ii(p)}, then each element of Ebd{A 2 ) has to contain ThL{p), 
but Ebd{A 2 ) still can contain for example T/i/,(p, g) and ThB{p,~'q)- 

B D B D D 

Examples. Z\i = (0, {p ^ -ig}, {T ^ p}), A 2 = (0, {p ^ -'g}, {T ^ p, T ^ g}). 
If Ebd{Ai) = {T/il(p, “■ g)} then generalized no-wishful thinking based on ap- 
plied desire rules implies ThL{p,q) ^ Ebe{A 2 ). However, note that TliL{q) may 
be in Ebd{A 2 ) (verification left to the reader). 

B B D B B 

Example 6. Ai = (0, {p ^ -ig, r ^ g}, {T ^ p}), A 2 = (0, {p -■g, r ^ g} 

,{T =^p,T r}). If Ebd{Ai) = {T/il(p, “• g)} then we have that the sets 

ThL{p,r,^q),ThL{p,r,q) ^ Ebd{A 2 ) but ThL{p,^q) and ThL{r,q) may be in 
Ebd{A 2 ) (analogous to the previous example, verification left to the reader). 

A simple instance of this generalized no-wishful thinking property, which we 
call Restricted no-wishful thinking, is the case where D is the empty set. This 
property says that every BD extension extends a B extension. 

Property 3 (Restricted Applied D rules). For each E' G Ebd{W,B,D') there is 
an if G Ebd{W,B,%) such that E C E' . 

4.2 Applied Belief Rules 

The second way to define no wishful thinking we consider is to look for a con- 
straint on just the beliefs. The set of applicable belief rules of one extension 
cannot be a strict subset of the applicable belief rules of another extension. 

Property 4 (Applied B rules, first attempt). For all Ei,E 2 G Ebd{A) we have 
Rb{A,Ei) C Rb{A,E2) implies Rb{A,Ei) = Rb{A, E2). 
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Unfortunately, this property does not give intuitive results, as the following 
example illustrates. 

Example 1. Let A = (0, {p A (?},{T p, T ^ ~'P})- Intuitively we have 
Ebd{A) = {T/iL(p,g),T/ii(-.p)}, i.e. RB{A,ThL{~^p)) C RB{A,ThL{p,q))- 
This example contradicts Property 4. 

The following property is a variant of Property 1. The removal of desires can 
only decrease the set of applied belief rules, not increase it or remove it. 

Property 5 (Applied B rules, second attempt). For each E' G Ebd{W,B,D') 
and D C D' there is an if G Ebd{W, B, D) such that Rb{{W, B, D), E) C 
Rb{{W,B,D'),E'). 

Property 5 gives the desired results for the rule sets in Example 1 and 7. 
However, Example 8 is a generalization of these two examples that shows why 
Property 5 has similar problems as Property 1. 

B D B D D 

Example 8. Ai = (0, {p q}, {T ^ p}), Z \2 = (0, {p q}, {T ^ p, T ~'p}). 

Intuitively we have Ebd{Ai) = {T/ii(p, g)} and Ebd{A 2 ) = 
{T/i^(p, g), T/ii(-ip)}. However, for E' = ThL{-'p) G Ebd{A 2 ) there is 
no if G Ebd{Ai) such that Rb{Ai,E) C Rb{A 2 ,E'). 

The following property is analogous to Property 2. 

Property 6 (Applied B rules, third attempt). For each E' G Ebd{W,B,D') and 
D C Rb{{W,B,D'),E') there is an if G Ebd{W, B, D) such that we have 
Rb{{W,B,D),E) C Rb{{W,B,D'),E'). 

The following example illustrates the distinction between Property 2 and 6. 

Example 9. Let Z\i = (0,0, {T p}) and A 2 = (0,0, {T p,T q}). If 
Ebd{Ai) = T/ii(p) then we cannot have Thri^) Ebd{A 2 ) according to 
generalized no wishful thinking based on applied desire rules, but it can be 
according to generalized no wishful thinking based on applied belief rules. 

Intuitively we do not want T/ii(0) in Ebd{A 2 ), but the reason for this 
is not the blocking of wishful thinking. Property 6 seems therefore a better 
characterization of no-wishful thinking than Property 2. 

Property 7 is analogous to Property 3. 

Property 1 (Restricted Applied B rules). For each E' G Ebd{W,B,D') there is 
an if G Ebd{W,B,%) such that Rb{E) C Rb(E'). 
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4.3 Abnormal Belief Rules 

The third way to define no wishful thinking is not based on applied rules but on 
rules which could not be applied, which we call abnormal rules. These abnormal 
rules are defined analogously to applied rules in Definition 5. 

Definition 6 (Abnormal rules). Let A = {W,B,D) be a BD theory and let 
the set E be one of its extensions. The set of abnormal rules is represented by 

AbsiA, E) = {a ^wGB\aA ~>w G E}. 

Generalized no wishful thinking based on abnormal belief rules is defined 
analogous to generalized no wishful thinking property based on applied rules in 
Property 2 and 6. 

Property 8 (Abnormal B rules, first attempt). For each E' G Ebd(W, B, D') 
and D C Rb{{W, B, D'), E') there is an if G Ebd{W,B,D) such that we have 
AbB{{W,B,D),E) 2 AbB{{W,B,D'),E'). 

The following example illustrates that generalized wishful thinking based on 
abnormal belief rules is different from generalized wishful thinking based on 
applied desire or belief rules in Property 2 and 6. 

Example 10. Let Z\i = ({-ig}, {p A g}, 0) and A 2 = ({-■g}, {p g}, {T =^p}). 
If ThB{~'q,p) ^ Ebd{Ai) then according to generalized no wishful thinking 
based on abnormal belief rules ThL{-<q,p) ^ Ebd{A 2 ). However, according to 
generalized wishful thinking based on applied desire or belief rules, it may be 
that T/ii(-.g,p) ^ Ebd{Ai) as well as ThL{~^q,p) G Ebd{A 2 ). 

5 BOID Properties 

The first BOID property is called Existence and says that there is at least one 
BD extension, if the facts W are consistent. This is a very desirable and crucial 
property for decision making agents, because an agent needs an extension to act 
rationally. Otherwise the agent is stuck or starts to make random movements. 

Property 9 (Existence). Ebd{W, B, D) yf 0 if T ^ ThriW). 

The second BOID property we discuss here is called BD maximality, and says 
that if a rule can be applied then it is applied. That is, we go as far as possible. 
This property implies that the set of BD extensions are a subset of the set of 
Reiter extensions where the set of rules consists of the union of belief and desire 
rules. We write Eb{A) for the set of all Reiter extensions of a propositional 
default theory, and if we consider Reiter extensions of BD theories consisting of 

a ^ w and a ^ w, then we ignore the superscript above the arrows, i.e. we 
interpret Eb{{W, BD)) as Eb{{W, {a ^ w | a A w G BD or a ic G BD)). 

Property 10 (BD maximality). Ebd{W, B, D) C Eb{W, B U D) 
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The following example reconsiders Example 10 and questions the BD maxi- 
mality property. 

Example 11. Let A = ({-ig},{p A 9 },{T p}) (we can also replace ~^q by 

T A -ig). We have Er{A) = {ThL{-^q,p)}, and thus with the existence pro- 
perty and the BD maximality property we can derive Ebd{A) = {T/ii(-ig,p)}. 
However, -<q /\p implies that to fulfill the desire for p we get into a situation in 
which something happens which we believe will not happen, namely the excep- 
tion to the belief that p implies 

The following theorem and its corollary suggest that BD maximality is too 
strong (see [11] for an alternative notion of extension). 

Theorem 2. No-wishful thinking based on applied desire rules (Property 2), on 
applied belief rules (Property 6) or on abnormal belief rules (Property 8) conflicts 
with BD maximality (Property 10) together with Existence (Property 9). 

Proof. For the applied rules, see Example 5 and 6. For the abnormal rules, 
see Example 10 and 11. 

Corollary 1. The BOID logic does not satisfy any of the three notions of no- 
wishful thinking discussed in this paper. 

6 Concluding Remarks 

We have discussed possible conflict types that may arise within or among in- 
formational and motivational attitudes and explained how these conflicts can 
be resolved within the BOID calculation scheme. The resolution of conflicts is 
based on Thomason’s idea of prioritization, which is considered in the BOID lo- 
gic as the order of derivations from different types of attitudes. We have shown 
that the order of derivations determines the type of conflict resolution method. 
For example, deriving desire before beliefs produces wishful thinking and deri- 
ving obligations before desires produces sociality. We have also introduced some 
desired and undesired properties, and checked whether some conflict resolution 
methods satisfied the properties. 

Two issues for further research are the generalization of properties for overri- 
ding to multiple attitudes, and for other input/output logics [10,11] than Reiter’s 
normal default logic. Although the properties are defined independent of the lo- 
gic, both Definition 5 and 6 of applied and abnormal rules must be adapted if we 

allow for e.g. reasoning by cases (e.g. Ebd{^, {« ^ w,-ia ^ w}, 0) = ThL{w)). 

^ Example 11 is not very convincing, because of the following two reasons. First, the 
behavior in Example 11 seems to be what is expected from conditional rules. If you 

B 

do not like it, then you can formalize the belief rule with T ^ p ^ q, where — >■ is a 
material implication. Second, in the discussion in Example 11 the rules are used as a 

B 

kind of causal rules. However, if the conditional p ^ q represents a causal relation, 
then the world will change such that -ig will turn into q. 




Resolving Conflicts between Beliefs, Obligations, Intentions, and Desires 579 



Acknowledgment. Thanks to Salem Benferhat, Zisheng Huang and Joris Hul- 
stijn for discussions on the issues discussed in this paper. 



References 

1. C. Bontilier. Toward a logic for qualitative decision theory. In Proceedings of the 
KR’94, pages 75-86, 1994. 

2. Michael E. Bratman. Intention, plans, and practical reason. Harvard University 
Press, Cambridge Mass, 1987. 

3. G. Brewka and T. Eiter. Preferred answer sets for extended logic programs. Arti- 
ficial Intelligence, 109:297-356, 1999. 

4. J. Broersen, M. Dastani, Z. Huang, J. Hulstijn, and L. van der Torre. The BOID 
architectnre: Conflicts between beliefs, obligations, intentions, and desires. In Pro- 
ceedings of International Conference on Autonomous Agents (AA’Ol), 2001. 

5. P.R. Cohen and H.J. Levesque. Intention is choice with commitment. Artificial 
Intelligence, 42:213-261, 1990. 

6. F. Dignum. Autonomous agents and norms. Artificial Intelligence and Law, 7:69- 
79, 1999. 

7. F. Dignum, D. Morley, E.A. Sonenberg, and L. Cavedon. Towards socially sophi- 
sticated BDI agents. In Proceedings of the ICMAS 2000, pages 111-118, 2000. 

8. Thomas Eiter, V.S. Subrahmanian, and George Pick. Heterogeneous active agents 
I: Semantics. Artificial Intelligence, 108 (l-2):179-255, 1999. 

9. D. Makinson. On a fundamental problem of deontic logic. In Norms, logies and 
information systems, pages 29-53. lOS Press, 1999. 

10. D. Makinson and L. van der Torre. Input-output logics. Journal of Philosophical 
Logic, 29:383-408, 2000. 

11. D. Makinson and L. van der Torre. Constraints for input-output logics. Journal 
of Philosophical Logic, 30(2):155-185, 2001. 

12. V.W. Marek and M. Truszczynski. Nonmonotonic logic: Context-dependent reaso- 
ning. Springer, Berlin, 1993. 

13. J. Pearl. From conditional oughts to qualitative decision theory. In Proceedings of 
the UAP93, pages 12-20, 1993. 

14. A. Rao and M. Georgeff. BDI agents: From theory to practice. In Proceedings 
of the First International Conference on Multi- Agent Systems (ICMAS’95), pages 
312-319, 1995. 

15. R. Reiter. A logic for default reasoning. Artificial Intelligence, 13:81-132, 1980. 

16. R. Reiter. A theory of diagnosis from first principles. Artificial Intelligence, 32:57- 
95, 1987. 

17. K. Schild. On the relationship between BDI logics and standard logics of concur- 
rency. Autonomous Agents and Multi Agent systems, 2000. 

18. R. Thomason. Desires and defaults: a framework for planning with inferred goals. 
In Proceedings of the KR’2000, pages 702-713. Morgan Kaufmann, 2000. 

19. L. van der Torre and Y. Tan. Contrary-to-duty reasoning with preference-based 
dyadic obligations. Annals of Mathematics and Artificial Intelligence, 27:49-78, 
1999. 

20. L. van der Torre and Y. Tan. Diagnosis and decision making in normative reaso- 
ning. Artificial Intelligence and Law, 7:51-67, 1999. 

21. G.H. von Wright. Norms, truth and logic. Practical Reason. Blackwell, Oxford, 
1983. 




Comparing a Pair-wise Compatibility Heuristic 
and Relaxed Stratification: Some Preliminary 

Results 



Robert E. Mercer^, Lionel Forget, and Vincent Risch^ 

^ Cognitive Engineering Laboratory, Dept, of Computer Science, The University of 
Western Ontario, London, Ontario, Canada 
^ InCA Team, LIM - ESA CNRS 6077, Centre de Mathematiques et d’lnformatique, 

Marseille, Erance 



Abstract. An extension-building heuristic is developed and a prelimi- 
nary investigation of its computational properties is given by comparing 
its run times to those of DeReS which uses relaxed stratification, another 
extension-building heuristic. Heuristics which can take advantage of the 
structural properties of a default theory may provide the information 
about the theory so that divide-and-conquer-like techniques may be ap- 
plied on those problems which exhibit appropriate structural properties. 
Structural properties of a default theory are defined in terms of proper- 
ties of graphs that represent important features of default theories. Un- 
like the syntax-dependent heuristics used in previous extension-building 
algorithms, the heuristic developed here is consistency-based. 



1 Introduction 

The problem of building extensions for propositional default theories, although 
being straightforward from an algorithmic point of view, is in the complex- 
ity class E 2 . [8] Heuristics that uncover appropriate structural properties of a 
default theory may allow divide-and-conquer-like techniques to be applied to 
reduce computation time. Few heuristics of this kind for the extension-building 
problem are known, with the noted exception of relaxed stratification [4]. In 
this paper we develop another extension-building heuristic and report on a pre- 
liminary investigation of its computational effects. Unlike relaxed stratification 
which is motivated by stratification in logic programming, our choice of struc- 
tural property of a default theory has been motivated, in spirit, by work which 
relates graphs and default theories [5-7, 10]. 

The crucial piece of information to build an extension is its set of generat- 
ing defaults. This feature has formed our principal goal: to discover potentially 
valuable, yet relatively easy to compute, information about sets of generating 
defaults. The proposed heuristic is an incremental method for generating the 
smallest possible supersets of generating defaults given only the currently known 
information about the defaults. More precisely, given a graph whose nodes repre- 
sent defaults and whose edges represent pair-wise compatibility between defaults 
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(that is, these defaults are not necessarily not in the same set of generating de- 
faults), cliques in this graph represent supersets of sets of generating defaults. 
If the superset of a set of generating defaults is a proper subset of the default 
theory, the effort expended by an extension-building algorithm will be reduced. 

We have chosen the complete (but brute force) extension-building method 
proposed in [12], because it is a clear and simple framework, to show this heuristic 
to be complete. We also discuss some preliminary results that we have obtained 
from an experimental study comparing the pair-wise compatibility heuristic and 
the relaxed stratification heuristic that has been implemented in DeReS [4]. 

2 Background 

As defined by Reiter [11], a closed default theory is a pair (W, D) where TR is a set 
of closed first order sentences and D a set of default rules. A default rule has the 
form ^ where a, (3 and 7 are closed first order sentences^, a is called the pre- 
requisite, (3 the justification and 7 the consequent of the default. PREREQ{D), 
JUST{D) and CONS{D) are respectively the sets of all prerequisites, justifi- 
cations and consequents that come from defaults in a set D. Whenever one of 
these sets is a singleton, we may identify it with the single element it contains. 
For instance, we prefer to consider PREREQ{{^^}) as an element rather than 
a set. The following definition shows us how the use of a default is related to its 
prerequisite: 

Definition 1. [I 4 ] A set D of defaults is grounded in W iff for all 5 G D there 
is a finite sequence So,...,Sk of elements of D such that (1) PREREQ{{5o\) £ 
Th{W), (2) for l<i<k - 1, PREREQ{{5i+i\) G Th{W U CONS{{5o, . . . , <5i})), 
and Sk = <3. 

An extension of a default theory is usually defined as a smallest fixed point of 
a set of formulas. It contains W , is deductively closed, and the defaults whose 
consequents belong to the extension verify a property which allows their use. 
The manner in which this property is considered is related to the variant of de- 
fault logic under consideration. In what follows, we consider the characterization 
previously obtained by [12, 13] for the extensions in the sense of [11]. 

Theorem 1. [12, 13] Let A = {W,D) be a default theory. E is an R-extension 
for A iff there is D' a grounded subset of D such that E = Th{W U CONS(D’)), 
and for each default 5 G D , of the form (i) if 5 G D' then a G E and 

-1/3 0 E, (a) if 5 ^ D' then a ^ E or -<j3 G E. Each set D' is called a set of 
generating defaults. 

Definition 2. Eor the default theory A = (W, D) and the potential set of gen- 
erating defaults D' , the default 5 = G D is a destroying default^ for D' if 
5 ^ D' but a G E and ~<j3 0 E , and D' \J 5 is not a set of generating defaults. 

^ For simplicity, we restrict our attention to defaults with only one justification: the 
generalization to several justifications is straightforward, and is described in [1,13]. 
^ These defaults are called selection defaults in [4] (because they are viewed as filters) 
and killing defaults in [3]. 




582 R.E. Mercer, L. Forget, and V. Risch 



A complete method for computing the R-extensions of a default theory is 
proposed by [13, 12] based on the characterization in Theorem 1. We refer to this 
method as Algorithm E. We have chosen Algorithm E because it is brute force; 
hence it is simple, clear, and not encumbered by superfluous detail such as other 
heuristics. Roughly speaking, an extension is the set of theorems over the union 
of W and a maximal set of default consequences (maximal in the sense that the 
addition of any other default consequence would falsify either the prerequisite 
or the justification condition). In other words, the default consequences that 
contribute to an extension come from a subset V' of £> such that the prerequisite 
and justification conditions hold for every default of V. Thus, the central theme 
of the algorithm is to look for all subsets D' of D yielding extensions of A. 
Starting from D, the algorithm proceeds in three steps. 

Algorithm E 

1. (Consistency condition)® All maximal consistent subsets of IT U CONS{D) 
that contain W are found. That is, a collection of subsets Di . . . Dn of D is 
found such that every W U CONS{Di) is maximally consistent for 1 < i < n. 
For every Di and every S G D \ Di, W U CONS{Di U {<5}) is inconsistent. 

2. (Justification condition) The justification of every default <5 of every previ- 
ously computed Di is checked. This procedure yields maximal subsets Dl of 
Di such that Th{W U CONS{d{)) C {-^[3} = 0 for every (3 G JUST{d{). 

3. (Prerequisite condition) Maximal grounded sets of defaults are obtained by 
eliminating the defaults that are not grounded from the previously computed 
Dl- Testing groundedness for a default S G Dj consists of verifying that 
PREREQ{{S}) G Th(WU CONS(d{\{S})). U PREREQ{{S}) G Th(W) 
then {(5} is grounded; if PREREQ{{S}) is derived from the consequences 
of a subset D' of Dl \ {<5} then the groundedness of D' has to be checked. 
Thereby, a sequence of verifications is generated for the defaults of Dl such 
that 

— each default used for a verification is removed from Dl ; 

— Dl is grounded iff every default of Dl belongs to a sequence that validates 
its prerequisite and the prerequisites of the first defaults of the sequence 
belong to Th{W) (i.e. removing defaults from the sequence after each 
verification does not yield the empty set). 

Note that testing the groundedness of Dl costs no more than testing the 
prerequisite condition on the defaults of Dl- 

At the end of the process, only the maximal computed sets of defaults are re- 
tained as good candidates for sets of generating defaults (note these sets are 
already sets of generating defaults with respect to Lukaszewicz’s approach to 
default reasoning [9,12]). Following Theorem 1, R-extensions are produced by 
the sets of defaults for which the complementary set of defaults satisfies (ii). In 
other words, to obtain an extension, it is necessary to deal with the defaults that 

Step 1 is not necessary for Algorithm E to be complete. It will be ignored below. 



3 
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are not involved in the construction of this extension, that is, it is necessary to 
check whether these defaults can be removed from the set of defaults that may 
yield an extension. It is only under this condition that this set of defaults can 
be called as a set of generating defaults under Reiter’s approach. 

Algorithms such as Algorithm E can be inefficient. On many problem in- 
stances, they rediscover extensions or they can rediscover that a certain subset 
of defaults lack an extension regardless of which other defaults are considered. 
These inefficiencies can oftentimes be removed with divide- and-conquer tech- 
niques that leave the completeness of Algorithm E intact. These techniques can 
reduce the number of times that an extension is discovered by limiting the num- 
ber of permutations of the defaults and can reduce the rediscovery of failures by 
finding incompatibilities in sets of defaults early. The effect of the heuristic can 
also be pictured as pruning of the search tree that is generated by Algorithm E. 
We use this view in our presentation below. 



3 Theoretical aspects 

We first introduce the graph which represents pair-wise compatibility between 
defaults in a default theory. 

Definition 3 . For all defaults 5 i = ^ and $2 = ^, where a, b, c, d, e, f are 
propositional formulas, < 5 i and 62 are said to be incompatible iff a A d \~ F, or 
a A e h T, or a A / h T, or b A d \~ F, or b A f \~ F, or c A d \~ F, or c A e \~ F, 
or c A / h T. They are said to be compatible, otherwise. 

The incompatibility and compatibility relations are reflexive and symmetric. 
When referring to the (in)compatibility of a default and itself, we will say that 
a default is self- (in) compatible. It is noteworthy that W is not included on the 
left-hand side of h. The compatibility relation that we have defined above is 
weaker (more defaults are compatible) than if W were added to the left-hand 
side of each h. Initial experimentation using this compatibility relation will be 
followed by refinement of the heuristic. 

All sets of generating defaults must be compatible with W. In the following, 
we will denote w\ = {-^}. Considering IT as a default is only a convenience to 
make the definition of the compatibility graph simpler. 

Definition 4. The compatibility graph, Ga{Na, Ea), for a default theory A = 
(W,D), is Na = DUwi, the set of nodes, and Ea, the set of edges. There is an 
edge between two nodes < 5 i and S2 iff < 5 i and 82 are compatible. 

Definition 5. A clique in a graph is a completely connected subgraph. The term 
is used throughout this paper to mean cliques which are maximal in the sense of 
subgraph containment. 

Cliques in the compatibility graph, which represents mutually pair-wise compat- 
ible defaults, are the structures which are at the heart of our heuristic. If these 
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cliques are proper subgraphs of the compatibility graph the original problem has 
been successfully divided into simpler problems. 

Because disjunctions in the components of the defaults can hide potentially 
useful incompatibility relations, the heuristic could benefit from information 
gained while the extension-building procedure is being done. We show how to 
propagate this information in the compatibility graph. 

Definition 6 . Information, i = ii, . . . ,i„ is propagated in a graph by modifying 
the incompatibility relation to be a A d \~ or a A e h _L, or a A (/ A /\ i) h _L, or 
6 A d h _L, or 6 A (/ A /\ i) h _L, or c A d h _L, or c A e h _L, or c A (/ A /\ i) h _L."^ 



Algorithm E* 

If W is inconsistent 

Then print “W is inconsistent” 

Else Generate the compatibility graph G 
Find cliques Ci in G 
For each clique Gi do E *{Gi) 

E*{A) 

For each possible order o (on nodes) of A do 
While the order o is not empty do 

% This is Step 2 of Algorithm E with information propagation added % 
If {w U CONS((5i I (5i £ A) I — <just{o{l)) 

Then Propagate ^just{o{l)) in A 

For each clique G' obtained do E *{G') 

Empty the current order and erase all other orders with 
the same beginning 

Else Remove the current node of the current order 
If all defaults in extension are grounded and no destroying defaults exist 
% This is Step 3 of Algorithm E with test for R-extension added % 
Then keep the extension 
Else delete the extension 



Example 1 . 

Given the default theory: A = ({A, D, G}, {< 5 i = ^^,82 = ^^,83 = ^^}), 
the compatibility graph has two cliques: Gi = {rci, <^2} ^nd G2 = {wi, < 5 i, (js}. 
Having these two cliques indicates the impossibility of finding 82 and 83 in the 
same extension (their consequences are incompatible). Thus, the original prob- 
lem of finding the correct D's in the original set D = {< 5 i, <52, <^3} has been turned 
into two simpler subproblems: finding the correct D's in Gi — w\ and G2 — rci. 

Example 2 . 

Given the default theory: A = ({A},{< 5 i = ^^,82 = ^^^,83 = ^^}), the 

^ Propagating information into the prerequisite and justification of defaults is left to 
future work. 
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compatibility graph has only one clique because the graph is complete. Having 
one clique says that there it is no incompatibility relation in this default theory. 
So, no information is available to help Algorithm E*. Moreover, when Algorithm 
E* discovers new information, propagating this information in the graph will 
only remove one default each time. So, Algorithm E* will not be any better 
than Algorithm E on this example. This is a very bad case for the heuristic, but 
in this case, applying the heuristic is very inexpensive because, it is easy to find 
all cliques in a complete graph. 

Example 3 . 

We consider now the following example: A = {w = {A, B — 5 - -i( 7 , {C W D) — > 
-lA, P}, {( 5 i = ^^,62 = '§^1^3 = c'vg This default theory has no exten- 
sion. In this case. Algorithm E explores the complete search tree, and concludes, 
of course, that there is no extension, because all of the possible sets obtained 
are “destroyed” by the application of the default < 5 i. When using Algorithm 
E*, we have to look at two steps in particular. First, the compatibility graph 
is complete. So, there is only one clique. It is given to Algorithm E*. Second, 
the first step of Algorithm E* is to try to entail -iC. It succeeds. This new in- 
formation is propagated in the graph. This propagation generates two cliques, 
because considering -i( 7 , S2 and S3 become incompatible because of D. Having 
two cliques allows Algorithm E* to study the two sub- problems Ci = {w,S2} 
and C2 = {w,S3}. As in the previous example. Algorithm E* will conclude that 
there is no extension (because of the destroying default < 5 i) without backtrack- 
ing. Intuitively, propagating the new information says that considering this new 
information, S2 and Js cannot appear in the same extension. Of course, it will 
be the same thing, if Algorithm E* tries to prove another justification instead 
of ^C. 

Algorithm E* is only Algorithm E with the new heuristic. Of course, if there 
exist cases where our heuristic is not applied, then only Algorithm E is used. So in 
the following, we will consider that Algorithm E is proved. Clearly, our method is 
a heuristic: It only helps Algorithm E by removing parts of the search tree which 
are redundant, given the knowledge gained by the heuristic. So, sometimes the 
heuristic just simplifies the problem for Algorithm E. Each time that Algorithm 
E discovers another piece of structural information about some defaults, the 
heuristic tries to propagate it in the current clique representation. If propagating 
the information divides the current clique. Algorithm E continues with these 
new cliques. If this is not the case, then Algorithm E continues. In each case 
Algorithm E finishes. So, we will prove that at each step, the heuristic makes only 
good problem reductions, and gives back to Algorithm E coherent subproblems. 

Property 1 . Every set of generating defaults is contained in a clique. 

Proof. By definition, sets of generating defaults are sets of compatible defaults, 
so, these sets will be included in at least one clique. 

Property 2 . Every clique containing a set of generating defaults will be studied 
by E*. It is then impossible to forget one clique representing an extension. 
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Proof. To prove this property it is sufScient to look at the algorithm. After 
generating the compatibility graph, we are looking for cliques, and for each 
clique we call the E* function. 

Property 3. If the E* function is able to prove the negation of a justification, then 
propagating this new information in the clique cannot split a set of generating 
defaults. 

Proof. Suppose that propagating a new piece of information, -i6, in the clique 
splits the set of generating defaults. This means that the consequences ci, . . . , c„ 
of a subset Si,. .. ,Sn of the set of generating defaults entails b (if -i6 A ci A . . . A 
c„ A iu h T then ci A . . . A c„ A iu I — ^6) . So by monotonicity, the set of generating 
defaults entails b but does not entail -i6 (by definition a set of generating defaults 
represents an extension and an extension is consistent). Therefore W and the set 
of consequences of generating defaults) is compatible with b, hence propagating 
-i6 cannot split the set of generating defaults. 

Theorem 2. Algorithm E* is complete. 

Proof. The algorithm is recursive, so we will prove that at each step Algorithm 
E* is correct (each step is ultimately concluded by Algorithm E, a complete 
algorithm, if provided with supersets of generating defaults). We proved that 
the initial case is correct (Properties 1 and 2). Now, it is necessary only to prove 
that each time that a new piece of information is propagated in the E* function, 
then propagating this new information in the clique we are considering cannot 
lead to false or lost extensions. 

A : The clique contains no set of generating defaults. In this case 

• If the propagated information only removes one or more defaults in the 
clique, then the set obtained using the clique is the same as the set ob- 
tained by Algorithm E. The algorithm continues as if it were Algorithm 
E, so it is correct. 

• If the propagated information removes one or more defaults and splits the 
set of remaining defaults into two or more cliques, then since the previous 
clique contains no set of generating defaults, then the new ones contain 
no set of generating defaults, and then with each new set. Algorithm E 
will not find any extension, because it is complete. 

B : The clique contains one or more sets of generating defaults. In this case 

• If the propagated information only removes one or more defaults in the 
clique, then the set obtained using the clique is the same as the set ob- 
tained by Algorithm E. The algorithm continues as if it were Algorithm 
E, so it is correct. 

• If the propagated information removes one or more defaults and splits 
the set of remaining defaults into two or more cliques, then by Property 
3 every set of generating defaults is contained in one of these new cliques 
and Algorithm E continues with all of these new sets, so it is correct. 
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4 Description of DeReS 

DeReS produces sets of generating defaults by searching a full binary tree rep- 
resenting all the subsets of the default rules in D. The binary tree is structured 
in the following way: The root is labelled as (j). The right child of each node is 
labelled with the same label as its parent. The left child of each node is labelled 
with the set that represents its parent unioned with the default that is being 
added by the current level. Of importance to the discussion that follows is that 
the complete default theory is represented by the tree’s leftmost leaf node. 

The heuristic used by DeReS is relaxed stratification, an idea influenced by 
Logic Programming.® Relaxed stratification uses a syntactically-based partition 
{Di , . . . , Dn} of the default theory: propositional variables appearing in defaults 
from Di do not appear in the consequence of defaults from Dj, for i < j, and 
no set Di can be further partitioned preserving the constraint on variable oc- 
currence. Also, formulas in W do not have common propositional variables with 
the consequents of the defaults. Extensions for the default theory are computed 
by letting Wi = W and incrementally finding extensions for the default theories 
(Wi,Di), where Wi = unioned with the consequences of the generating 

defaults for (Wi_i,Di_i). Three types of pruning are achieved by DeReS using 
relaxed stratification. Firstly, if a node in the binary tree represents an exten- 
sion, the left and right subtrees can be pruned, since extensions are maximal. 
Secondly, relaxed stratification is a divide-and-conquer technique. R implicitly 
prunes the search tree by not considering combinations of defaults that are cho- 
sen from different strata. Thirdly, strategic location of destroying defaults in the 
strata can prune subtrees, preventing the rediscovery of extensions that would 
be destroyed by the destroying default. 

5 Implementation of the Pair-wise Compatibility 
Heuristic 

We are currently using a loosely-coupled hybrid system to obtain the preliminary 
results reported here. To produce the pair-wise compatibility graph we are using 
a simple inspection method, because the defaults in the problems that we have 
studied are simple enough to allow this. We are using the Bron and Kerbosch 
algorithm, known to be the best algorithm for finding all cliques in a graph [2] . It 
produces each clique in a small constant time and most importantly it produces 
each clique once. Given these two factors, the computation of the cliques adds 
almost nothing to the time to compute extensions. (This preprocessing time 
is not reported in any of the run times.) We also use the theorem prover and 
some other parts of the DeReS (Version 1.3) program for our timing of the pair- 
wise compatibility heuristic. Doing so means that the differences in run times 
is due solely to the effects of the heuristics on the extension-building. (Because 

® The language used in [4] has a strong Logic Programming flavour — one section is 
titled “Programming with default logic”. 
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Algorithm E* builds from D to the sets of generating defaults and DeReS builds 
from (p to the sets of generating defaults, it is not clear what computational 
effects the two search strategies will ultimately have.) 

To simulate the algorithm, with the pair-wise compatibility heuristic embed- 
ded properly, generating all of the extensions of a default theory, the hybrid 
method first calculates all of the cliques of the compatibility graph and then 
gives each clique to DeReS as a separate problem. The run times that we report 
are simply the sum of the times taken by all of the individual runs. 

We have used this hybrid approach for a number of reasons. Firstly, as men- 
tioned above, we didn’t want any of the comparison to be biased due to parts 
of the implementation that have nothing to do with the heuristics. So, for the 
two heuristics, the extension-building engine is DeReS. Secondly, we are inter- 
ested in studying the two heuristics combined. Using DeReS gives us access to 
relaxed stratification. (We currently can only conjecture that the two heuristics 
are independent so they can be used concurrently.) Thirdly, the University of 
Kentucky website described in [4] contains a number of default theories set up 
for use with DeReS. These publicly available test cases are a reasonable place 
to begin experimentation. Fourthly, we have only experimented with disjunction 
free theories. So, the incremental part of the compatibility heuristic is not being 
tested. When we move to this phase of the experimentation, we will need to 
move away from this hybrid approach. 



6 Experiments and Results 

6.1 Experimental Background 

The University of Kentucky website described in [4] contains a number of default 
theories encoded for use by DeReS. We have used two sets of problems contained 
there to give us the preliminary results that we report below. The first problem 
set contains default theories that represent maximum independent sets (when 
the problem is represented as a graph, the maximum independent set problem 
is the dual of the clique problem). The representation is discussed in [4]. What 
makes this problem set interesting is that there are no self-incompatible defaults 
and no problem has a non-trivial stratification. The second problem set contains 
default theories whose extensions are precisely the kernels in directed graphs. 
What makes this problem set interesting is that the default theories include self- 
incompatible defaults (more than half of the defaults) and that the theories are 
finely stratified. W is the empty set for both problem sets. 



6.2 Results 

The four ‘maximum independent sets’ theories contain 10, 20, 30, and 40 de- 
faults. Excluding the 10- and 20-default cases (the results are almost the same), 
a summary of the run times when using the pair-wise compatibility heuristic 
compared to using the relaxed stratification heuristic are in Table 1. The spread 
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in running times increases with the number of defaults. The relaxed stratifica- 
tion heuristic does not modify the original default theories in any way. On the 
other hand, the 2101 and 15,380 cliques produced by the pair-wise compatibility 
heuristic for the 30- and 40-default theories, respectively, are precisely the sets 
of generating defaults for the extensions. This situation is the best possible for 
the pair-wise compatibility heuristic, and the worst possible for relaxed strat- 
ification. Because of the way that we have designed the experiment, only the 
heuristics account for the differences in run times®. 

The seven ‘kernel’ theories contain 80, 175, 252, 343, 448, 567, and 700 de- 
faults. We studied only the first. It has 80 defaults, 48 being self-incompatible 
defaults. Each default theory consists of a number of prerequisite-free normal 
defaults, one whose consequence is and one whose consequence is -la^ for 
propositional letters, oi, . . . , a„, where n depends on the theory. There are also 
a number of non-normal defaults which are self-incompatible defaults. The pre- 
requisites of these self-incompatible defaults are conjunctions of some of the 
ai and -la^ found in the normal defaults. The 65,536 cliques are then just all 
possible maximal combinations of the normal defaults whose consequences are 
consistent. Six extensions result. The results are summarized in Table 1. DeReS 
with relaxed stratification performs very well (the theory is highly stratified). 
If the cliques^ produced by the pair-wise compatibility heuristic are all blindly 
given to DeReS, the results are very poor. The results shown here use stratifica- 
tion also. However, if the cliques are pruned using the information provided by 
the self- incompatible defaults, (we simulate this pruning), then almost no search 
for the extensions is needed (the result in the table is shown as an approximation 
of the true run time) . The last two columns show what happens to the run times 
if the self-incompatible defaults are moved so that they all follow the normal 
defaults, and if stratification is turned off. 



6.3 Discussion of Results and Experiments 

When interpreting these results it should be noted that W is empty and because 
the theories are disjunction-free and propositional we have been able to use table 
lookup methods instead of theorem proving since the representation allows this 
very simple form of proof procedure. Questions regarding the time to compute 
the cliques arise. At this time we are unable to answer this type of question with 
any authority. The overhead from the theorem prover is 0{n?), where n is the 
number of defaults, and the times to compute the cliques is just the overhead of 
the clique algorithm itself. What to do with a non-empty IT is a very interesting 

® Times for the computation of the stratification and the compatibility graph are not 
included. 

^ Because we used the DeReS tree search engine, what is given to DeReS are the 
cliques unioned with the set of self-incompatible defaults, added in the same relative 
locations as in the original theory. Technically, all of the defaults not in the set of 
generating defaults are potentially destroying defaults, but because of the particular 
structure of all of the defaults in this theory, only the self-incompatible ones can 
possibly be destroying. 




590 R.E. Mercer, L. Forget, and V. Risch 



Table 1. Run times (in seconds) of various experiments on three default theories. 



Theory 


\D\ 


Relaxed 


Cliques 


Cliques Self-incompatible 


No 






Stratification 




(pruned) at end of theory 


Stratification 


MaxIndSets 


30 


9.99 


0.5 






MaxIndSets 


40 


264.86 


5.65 


- - 


- 


Kernel 


80 


0.022 


3.78 


0 1.97 


36.19 



problem. The compatibility relation has been defined purposefully without W, 
in order that the relation is easy to compute. Whether this is a good decision is 
left for future work. 

During the testing of the ‘maximum independent sets’ theories, we discovered 
that DeReS does not make one very important prune.® If the complete default 
theory is an extension, DeReS continues to search the complete tree. Relaxed 
stratification requires the tree to be built from the empty set to sets of generating 
defaults, as described in Section 4. This means that if D is a set of generating 
defaults, the left-most leaf node in the tree represents an extension and no other 
nodes in the tree can represent an extension. So, the rest of the tree can be 
pruned. This prune generalizes to being able to prune any subtree whose left- 
most leaf is a set of generating defaults. Any method that constructs from the 
full default theory to sets of generating defaults at the leaves of the tree does 
this prune automatically. 

We are no longer certain that some of the combinatorial graph problems 
make a good testbed for testing general heuristics. These problems have too 
much structure that can be taken advantage of by specialized heuristics. One 
outcome of these and future experiments may be that both general and special- 
ized heuristics are desirable. 

7 Conclusions and future work 

Our contribution to improved default theory extension-building algorithms has 
been a heuristic which can divide the original problem into smaller problems by 
discovering structure in the default theory with less computational cost. 

In this paper we have presented an initial comparison of two heuristics, the 
compatibility heuristic that we have introduced in this paper and the relaxed 
stratification heuristic found in the DeReS algorithm [4] . Although much remains 
to be done in such a comparison, we have indicated that these two heuristics take 
advantage of very different structural aspects of the default theories. The com- 
patibility heuristic seeks out semantic relations (compatibility is defined in terms 

® We have not confirmed with the authors of DeReS, but given the datasets on which 
their implementation was tested, this would have been an easy prune to have missed. 
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of consistency) among defaults, while relaxed stratification takes advantage of 
the syntactic presentation of the default theory. 

Both heuristics seek to reduce the problem but in different ways. The com- 
patibility heuristic reduces the number of defaults that need to be considered 
when constructing an extension by creating initial clusters of defaults which are 
the only clusters that can contain extensions. Relaxed stratification takes ad- 
vantage of a reduced interaction among defaults to incrementally build parts of 
extensions which then form extensions by a simple union. 
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Abstract. Consistency-based approaches in nonmonotonic reasoning may be ex- 
pected to yield multiple sets of default conclusions for a given default theory. 
Reasoning about such extensions is carried out at the meta-level. In this paper, we 
show how such reasoning may be carried out at the object level for a large class 
of default theories. Essentially we show how one can translate a (normal) default 
theory A, obtaining a second A', such that A' has a single extension that encodes 
every extension of A. Moreover, our translated theory is only a constant factor 
larger than the original (with the exception of unique names axioms). We prove 
that our translation behaves correctly. In the approach we can now encode the no- 
tion of extension from within the framework of standard default logic. Hence one 
can encode notions such as skeptical and credulous conclusions, and can reason 
about such conclusions within a single extension. This result has some theoretical 
interest, in that it shows how multiple extensions of normal default theories are 
encodable with manageable overhead in a single extension. 



1 Introduction 

In nonmonotonic reasoning, in so-called consistency-based approaches such as default 
logic [9] and autoepistemic logic [6], one typically obtains not just a single set of de- 
fault conclusions, but rather multiple sets of candidate default conclusions. Consider 
the by-now hackneyed example wherein Quakers are normally pacifist, republicans are 
normally not, along with adults are normally employed. Assume as well that someone 
is a Quaker, republican, and an adult. In default logic (see Section 2) this can be en- 
coded by: {Q, R, ^})- This theory has two extensions or sets of 

default conclusions, one containing {Q, R, A, E, P} and the other {Q, R, A, E, ~'P}. 
In autoepistemic logic the same example appropriately encoded yields two analogous 
expansions or possible belief sets. 

Reasoning about these extensions (resp. expansions) is carried out at the meta-level: 
a default conclusion that appears in some extension (such as P) is called a credulous 
(or brave) default conclusion, while one that appears in every extension (such as E) is 
called a skeptical conclusion. Intuitively it might seem that skeptical inference is the 
more useful notion. However, this is not necessarily the case. In diagnosis from first 

* Affiliated with Simon Fraser University, Burnaby, Canada. 
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principles [10] for example, in one encoding there is a 1-1 correspondence between 
diagnoses and extensions of the (encoding) normal default theory. Hence one may want 
to carry out further reasoning to determine which diagnosis to pursue. More generally 
there may be reasons to prefer some extensions over others, or to somehow synthesize 
the information found in several extensions. 

In this paper, we show how such reasoning can be carried out at the object level. For 
a default theory A = (D, W), we translate A to obtain a second theory A' = (£)', W), 
such that A' has a single extension that encodes every extension of A. Given this, one 
can express in the theory what it means for something to be a skeptical or credulous 
default conclusion. Our result isn’t completely general; however it applies to normal 
default theories. The translation has several desirable properties. The translated theory 
A' is only a constant factor larger than the original A, with the exception of introduced 
unique names axioms. As well, we prove that our translation behaves correctly. 

We first show for a set of defaults Dm how, using an encoding, we can detect the 
case wherein all defaults in Dm apply. From this, for a default theory {D U Dm^ W) 
we show how to obtain a second theory wherein (informally) either all of the de- 
faults in Dm are applied en masse (if possible) or none of them are. This is done 
by naming each of the defaults in Dm, and then expressing in default logic the ap- 
plicability conditions for the defaults. We develop this in Section 3 . In Section 4 we 
present our main result, where we show how a default theory can be translated into 
a second theory whose extension encodes the extensions of the original. Roughly we 
provide an axiomatisation that “locates” maximal sets of applicable defaults; for such 
a set, the set of default conclusions is “tagged” with the set name, to distinguish it 
from other instances. For example, in our original example, let mi s be the name of 
the set and m2, 3 be the name of { These are maximal 

applicable sets of defaults, and from our translation we would obtain a single exten- 
sion containing {Q(mi,s),(5(m2,3),i?(mi,3),i?(m2,3),^(mi,3), A{ni 2 , 3 ), ■E(mi,3), 
-£'(w2,3),P(mi,3),-'P(m2,3)}. As mentioned, we are able to prove that our transla- 
tions in fact accomplish what is claimed. 

The advantage of this approach is that we can encode the notion of extension within 
the framework of standard default logic. Hence one can reason about (skeptical and 
credulous) conclusions within the framework of a single extension of a default theory. 
Thus for example, in a diagnosis setting one could go on and axiomatise notions of 
preference among diagnoses having to do with, perhaps, number of faulty components, 
or based on components expected to fail first. This result has some theoretical interest, 
in that it shows (for theories that we consider) how multiple extensions are encodable, 
with no significant overhead in a single extension. The overall approach is similar to that 
of [2]. 



2 Default Logic 



Default logic [ 9 ] augments classical logic by default rules of the form . A default 
rule is normal if (3 is equivalent to 7; it is semi-normal if /? implies 7. We sometimes 
denote the prerequisite a of a default 6 by PRE{S), its justification (3 by JUS{ 6 ), and 
its consequent 7 by CON{S). Accordingly, PRE{D) is the set of prerequisites of all 
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defaults in D\ JUS{D) and CON{D) are defined analogously. Empty components, such 
as no prerequisite or even no justifications, are assumed to be tautological. Semantically, 
defaults with unbound variables are taken to stand for all corresponding instances. A 
set of default rules D and a set of formulas W form a default theory {D^W) that may 
induce a single or multiple extensions in the following way [9]. 

Definition 1. Let A = (D, W) be a default theory. For any set of formulas S, let F/fS) 
be the smallest set of formulas S' such that 

1. W C S', 

2. Th(S') = S', 

3. For any G D, if a G S' and ^ S then j G S' . 

A set of formulas E is an extension of A iff F/s{E) = E. 

Any such extension represents a possible set of beliefs about the world at hand. Further, 
define for a sef of formulas S and a set of defaults D, the set of generating default rules 
as GD{D, S) = {SgD\ PRE{S) G S and -nJUS{S) ^ S} . 

3 Applying All, or None, of a Set of Defaults 

In this section we consider the problem of how to apply all defaults in some set, or none 
in the set. We will thus work with default theories {D,W) having some distinguished 
finite subset Dm F D. For making the set Dm explicit, we denote such theories by 
{D U Dm, W). The idea is that we wish to obtain extensions of {D U Dm, W) sub- 
ject to the constraint that all defaults in Dm are applied, or none are. For example, in 
the theory ({^} U } ,0) we would want to obtain an extension contain- 
ing A, but not B (since both defaults in {^, } cannot be jointly applied). For 

({ "X } { "X’ XX } ’ {^}) would want to obtain an extension containing A, B, 

and D. 

We begin by associating a unique name with each default. This is done by extending 
the original language by a set of constants' N such that there is a bijective mapping 
n : D ^ N. We write ns instead of n{S) (and we often abbreviate ns^ by Ui to ease 
notation). Also, for default 6 along with its name n, we sometimes write n : 6 to render 
naming explicit. To encode the fact that we deal with a hnite set of distinct default rules, 
we adopt a unique names assertion (UNAat) and domain closure assertion (DCAat) with 
respect to N. So, for a name set TV = {rii , . . . , rifc}, we add axioms 

UNA AT : -'{ui = Uj) for all ni,nj G N with i ^ j 
DCAat : \/x. name{x) = (x = ni V • • • V x = Uk) ■ 



We write Vx G N. P{x) forVx. name{x) D P{x). 

We introduce a new constant m as the name of the designated rule set Dm- We relate 
the name of the rule set denoted by m with the names of its members by introducing a 
binary predicate in where in{x, y) is true just if the default named by x is a member of 

* [5] first suggested naming defaults using a set of aspect functions. See also [8,1]. 
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the set named by y. In this section, instances of in will be of the form m(-, m). While 
we could get away with not using in (and m) here, this additional machinery is required 
in Section 4, and it is most straightforward to introduce it here. Note that we do not 
need a full axiomatization of in, representing set membership, since we use it in a very 
restricted fashion. 

For applying all, or none, of the defaults in Dm, we need to be able to, first, detect 
when a rule has been applied or is blocked and, second, control the application of a rule 
based on other prerequisite conditions. There are two cases for a default to not be 
applied: the prerequisite is not known to be true (and so its negation -■« is consistent), 
or the justification is not consistent (and so its negation -i/3 is derivable). For detecting 
this case, we introduce a new, special-purpose predicate bl/1. Similarly we introduce a 
special-purpose predicate ap/1 to detect when a rule has been applied. For controlling 
application of a rule we introduce predicates ok/1 and ko/1. 

We are given a default theory ( 79 U Dm , W ) over language C and its set of associated 
default names N(j{m}. ^ Let 

Dm = [iij : I j = 



(For simplicity, we reuse the symbols j, k, m, nj, aj, etc. below.) We define Sm{{D U 
Dm, W)) = (79', W') over C* , obfained by exfending C fo C* with new predicates 
symbols ok/1, ko/1, bl/1, ap/1, and names 7 VU{to}, as follows 



where 



79' = 79 U 79^ U Dm 

IL' = FF U FFm U {DCAat.UNAat} 



Dm = 

Dm = 



U 



{ 

{ 

{ 



7jA"(«3) 



J = 1-fc} 

} 



: -’Qj (7iA---A7fc)p-'/3j : 
• *(m) ’ * *(m) 




Wm = {Va; € N.in{x, m) = {x = n\ V...V x = n^)} 
U {bl(m) D ko(m)} 

U {(Vx G N.in{x,m) D ap(a;)) D ap(m)} 



( 1 ) 

(2) 

( 3 ) 

( 4 ) 

( 5 ) 

(6) 



Clearly, Dm contains the images of the original rules in Dm- Each rule Sj G Dm is 
applicable, if ok{nj) is derivable. In fact, we assert ok(rij) for every Sj G Dm, unless 
we cannot jointly apply all rules of Dm- That is, before activating the constituent rules, 
we have to make sure that none of them will be blocked. This is accomplished through 
the justification -ikojm) in (2) together with Axiom (5). We block Rule (2) (and with it 
the derivability of all ok{nj)) when we detect that one of <5i, . . . , is blocked. That is, 
ko(m) will be an immediate consequence of bl(m). 

Now, we have that Dm is blocked (bl(m)) just if some rule in Dm is blocked. 
However, since we must control a whole set of defaults, we must check for the blockage 



^ We let U stand for disjoint union. 
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of one of the constituent default rules in the context of all other rules in the set applying. 
For detecting the failure of consistency, we verify for and some set of formulas S 
(cf. Definition 1), whether S U { 71 , . . . , 7 ^} I — rather than S I — <j3j . This motivates 
the prerequisite of the second rule in (3). This context, (71 A • • • A jk), is not needed 
for detecting the failure of derivahility hy means of the first rule in (3), since this test is 
effectuated with respect to the final extension E via ~^aj ^ E. 

Finally, as given in ( 6 ), Dm is applied (ap(m)) just if every rule in Dm is applied; it 
is only in this last case that the consequents of the constituent rules in Dm are asserted. 

Consider theory {D U Dm, W), where 

D={^], Dm = {ni-.^,n2-.f). ( 7 ) 

For Dm and Dm, we obtain (after simplifying and removing redundant defaults): 

••(ni): P ••("2) : S : (^PV^S) : 

PA“(ni)’ SA”(ri2)’ • • (ni ) A‘ • ("2) ’ • •(m) 

The in predicate has instances: in(ni,m) and in(n 2 ,m). From ( 6 ) we can deduce 
[ap(ni) A ap(ri 2 )] D ap(m). 

Let W = {-'(P A E A S)}. We obtain two extensions, one containing P, S, ^E and 
the other containing E, -■(PAS'). For the first case, we obtain ok(ni) andok(n 2 ). If both 
(5i and 82 are applicable (which they are) then we conclude P A ap(ni) and S A ap(ni) 
as well as ap(m). From this we get P and S and so -•E. For the other extensions, if 
the default aE is applied, then -iP V -■S' is derivable, and so ' is applicable, 

from which we obtain bl(m), and so ko(m), blocking application of .. ■ 
Consequently neither nor can be applied. 

In the next example, defaults inside a set depend upon each other. Consider (0 U 
Dm, 9) with 

Dm = |n-i ■~Q'j n-2 : | ■ 

We get for Dm and Dm the following rules. 

“ (ni) : Q QA“ (712) : R : (m) (-■Qv-'fi) : : --Q 

QA**(ni)’ i?A**(n2) ’ • • (ni)A* * (712) ’ "(m) ’ • *(m) * 

We obtain ok(ni), and ok(ri 2 )> which allow us to apply default 5i, yielding in turn 
Q A ap(ni). Given Q, we can now apply default 62 , yielding R A ap(ri 2 )- This allows 
us to deduce ap(m). We thus get an extension containing Q and R. 

The last example also shows why we cannot avoid the translation by replacing Dm 
by JUS{S) Section 4, this replacement would result in 

an exponential blowup in the encoding. 

The next theorem summarizes properties of our approach, and shows that rules are 
applied either en masse, or not at all. 

Theorem 1. Let E be a consistent extension of Sm{{D U Dm, W)) for default theory 
(D U Dm, W). We have that: 

1. ap(m) G E iff {a,p{ns) \ 5 G Dm} U CON{Dm) C E 
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2. bl(m) G E ijf{a,p{ns) \ 5 £ Dm} 2 ^ 

3. ok(n 5 ) G E iffap{ns) G E 

4. ok(n 5 ) G E for all 5 G Dm ^ E 

5. ap{ns) G E implies (ap(m) A in{ns,m)) G E for some S G Dm 

6. ap{ns) G E for 5 G Dm iff{ap{ns) \ S G Dm] C E. 

Theorem 2. For default theory (0UI?, W), we have that Sm{{^^ D , W)) has extension 
E where either E C = Th{W U CON{D)) or else E C\C = Th{W). 

The default theory (0 U { has an extension E where E f \ L = Th{%). 

Theorem 3. Let (D, W) be a (standard) default theory over C with extension E and 
(respective) set of generating defaults GD(D, E). Then 5m((0 U GD(D, E), W)) has 
extension E' where E = E' D C. 

4 Encoding Extensions Using Sets 

For encoding extensions of a normal default theory (D,W), we use the machinery 
developed in the previous section to determine maximal (with respect to set inclusion) 
sets of applicable defaults. Names are introduced for each subset of D, and for each 
instance of a rule in each subset of D. As well, new predicate symbols are introduced 
to further control application of sets of rules. We then give a translation that yields a 
second default theory (D' , W'). Viewed algorithmically, this second theory carries out 
the following: If the original set of defaults D constitutes the set of generating defaults of 
an extension, then a corresponding “ap”-literal is derived; all default consequences are 
obtained; and all subsets of the defaults are rendered inapplicable. If this isn’t the case 
(and D isn’t a set of generating defaults), we proceed along the partial order induced by 
set inclusion and consider every set \ {i5} for every S £ D to see whether it is a set 
of generating defaults. Crucially, default conclusions are “tagged” with the name of the 
set in which they appear so as to eliminate possible side effects. 

To name sets of defaults, we take some fixed enumeration (ni, . . . , n^) of and 
define m as a fc-ary function symbol. Then, for n± ^ N, dehne 

DCAm : Vxi, . . . , Xfc. set-name(m(xi, . . . ,Xk)) = 

(a;i = ni V xi = n±) A • • • A (xk = nuM Xk = nj_). 

Intuitively, Xi = nj_ tells us that rii does not belong to the set at hand. 

Accordingly, for x = x\..Xk and x' = xf.x} dehne 

UNAm : Vic, x' . set-name (m{x)) = 
set-name(m(x')) = x\ = x\ t\ ■ ■ ■ /\ Xk = x'f. . 

The advantage of this “vector-oriented” representation over a dynamic one including 
a binary function symbol (as with lists) is that each set has a unique representation. 
We write Vx G M. P(x) instead of Vx. set-name(x) D P{x). Further, we use M for 
denoting the set of all valid set-names, that is, 



M = {to I DCAm |= set-name(m)} . 
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In order to ease notation, we write mi_3 insteadof m(ni, nj_, ns, n±, . . . ,n±) when rep- 
resenting the set {(5i, (53}. Also, we abbreviate m(n_L, . . . , nj_) by m0 andm(ni , . . . ,Uk) 
hy niD- Note the difference between names rii and irii, induced by our notational con- 
vention. 

We also rely on the “vector-oriented” representation for capturing set membership, 
denoted by in/ 2 . Consider for instance N = {ni, 712}. Membership is then axiomatized 
through the formulas 



Vxi,X2. in{ni,m{xi,X2)) = (ni = Xi) 
\/xx,X2. in{n2,ni{xi,X2)) = {u2 = X 2 ). 



While this validates in(ni, mi_2), it falsifies zn(ni, TO2). See (15) for the general case. 

We need to be able to refer to separate instances of the same default appearing 
in different sets. For this we introduce a function-symbol ■/ 2 . For 5 j G Di we write 
nsj-rrii or nj-nii to name the instance of Sj appearing in Di. This results in name set 
N-M = {n-m \ n G N ,m G Mj. Corresponding axioms, as DCAjv.m and UNA^v.m^ 
are obtained in a straightforward way. In what follows, we refer to the various domain 
closure and unique names axioms pertaining to N , M, and N-M as Ax{N)? 

Given language £, we define a family of languages C{m) for m G M as follows. If 
P is an z-ary predicate symbol then P(-) is a distinct (z -|- l)-ary predicate symbol. If 
■j G C then 7(777) G C{m) is the formula obtained by replacing all predicate symbols 
in 7 with predicate symbols extended as described, and with term m as the (z -f 1)®* 
argument. This extra argument is used to index formulas by the (names of) sets in which 
they are used. 

Lastly, we introduce special-purpose predicates for controlling the application of 
sets of defaults. These are summarised in the following table: 



Name 


Use/meaning 


m Gm' 


L Djji' 


ok(e) 


It is ok to try to apply set/rule e 


ap(e) 


Set/rule e is applied 


bl(77z) 


Not all rules in set m can be applied 


OVr(77z) 


Some set named m' is applied and m\Zm' 


ko(77z) 


For set ttz, bl(r7z) V ovr(77z) is true 



Taking all this into account, we obtain the following translation, mapping default theories 
in language C onto default theories in the language £+ obtained by unioning all languages 
C{m) for m G M and using the aforementioned names and introduced predicates and 
functions: 

Definition 2. Given a finite default theory {D,W) over C and its set of associated 
default names N. define £{{D,W)) = (D' , W') over by 



D' = D^ U Dm U Z5-, 

W' = WdGWwGWmGW^\J Ax{N) 



^ Note that names in M and N-M are obtained from those in N. 
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where 



Dn = 



Dm = 



U 

U 



= 



Q:(ai) Am(n,ai) A* • (n-ai) : (3{x) 
'y{x)/\* • {n-x) 



. a : f3 



n : € D 

7 



1 Vy^N. in{y,x)D** {y 

{ in(n^x)A* * (x) : —ia(x) 

AS) 



■x) } 



, ol: (3 






([VygAf. m(y,g;)Dc(y,a;)]D-i/3(3;))A“(a:) : 
••(a;) 



n:^GZ?} 

r : ~'(a:IZy) : ^in(x,y) 1 

\ -'(a;IZy) ’ -^in(x,y) j 



Ww = {Vx G M. a{x) I a G EH} 

Wd = {Vx G M. c{ns, x) = CON{S){x) | <5 G £>} 
Wm = {Vxi, . . . , Xfc. in{rii, m(xi, . . . , x^)) 

= (n* = Xj) I Hi in {ni, . . .,rik)} 

U {Vx, x' G M.[3y G N.->in{y, x) A in{y, x')] 
A [Vy. in{y, x) D in{y, x')] D x C x' } 
Wfz = {ok(mi))} 



U {Vx G M[Vy G M. X C y D bl(y)] 

D ok(x)| 

U (Vx G M. [bl(x) V ovr(x)] D ko(x)| 

U (Vx G M [Vy G N.in{y,x) D ap(y-x)] 

D ap(x)| 

U |Vx,x' G M. ap(x) D (x' tl X D ovr(x'))} 



( 8 ) 

(9) 

(10) 

( 11 ) 



( 12 ) 

(13) 

(14) 

(15) 

(16) 

(17) 

(18) 

(19) 

( 20 ) 

( 21 ) 



The rules in and Dm directly generalise those in (1-3), from treating a single set 
named m to an arbitrary set referenced by variable x. The specific consequents used in 
the second rule in (3) are dealt with via the axioms in (Wi)/14) that allows us to quantify 
over default consequents (via predicate c). This trick avoids the exponential blowup that 
would occur in (1 1) if we were to explicitly give the consequences of the rules. 

The rules in (DD12) provide us with complete knowledge on predicates C and in. 
The axioms in (Ww /1 3) propagate the information in W to all possible contexts. 

Wm takes care of what we need wrt set operations. That is, (15) formalises set 
membership, while (16) formalises strict set inclusion. axiomatises the control flow 
along the partial order induced by C. Axioms (17) and (18) tell us when it is ok to 
consider a certain set: we always consider the maximum set D\ otherwise, via (18), we 
consider a set just when every superset is known to be blocked (and so inapplicable). 
(19) tells us when the consideration of a set is cancelled. This either happens because 
a set is inapplicable (given by bl) or because it has been explicitly cancelled (given by 
ovr). (20) asserts that a set is applied just if all of its member rules are. Once we have 
found an applicable set of rules (and hence a set of generating defaults) we need not 
consider any subset; (21) annuls the consideration of all such subsets. 
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For example, consider the following normal default theory: 

^22 = ({”^1 ■T-^2 ,0) • (22) 



From S{A22) we get an extension, where the only “ap-literals” are ap(mi_ 2 , 4 ) and 
ap(TOi_ 3 ). That is, A 22 has two extensions with generating defaults, the hrst with (5i, 
^ 2 , < 54 , and the second with (5i, < 53 . Among formulas in the extension of £^(Z\ 22 ) are 
^(wi. 2 , 4 ), B{mi^ 2 , 4 ), ~'B{mi^ 3 ), and D(toi^ 2 , 4 )- To see this, let us take a 

closer look at the image of A 22 , namely £^( 2 ) 22 ) ■ we get 



in{ni ,x)A* • (rii-x) : A{x) in{ri2 ^x)A* * {n^-x) : B{x) 

(ni-ai) B{x)A** {n2-x) 

in{n^^x)A* * {n^-x) ■. —^B{x) B{x)Ain{n4:,x)A**{n4-x)\D{x) 

->B{x)A** {n^-x) D(ai)A** (ri4-ai) 

We get a single nontrivial rule in (10), namely 



(23) 

(24) 



in{n4 ,x) A* • {x) :—yB{x) 

AT) 

and four rules in (1 1) 

{[iy^N. in{y,x)Z^c{y,x)]'D-yA{x))A** (tc) : 

AT) 

{\iy£N. in{y,x)Z^c{y,x)]Z^-yB{x))A** (x) : 
••(x) 

([VygAf. in{y,x)^c{y,x)]^ B{x))A“ (x) : 
••{x) 

([VygAf. in{y,x)^c{y,x)]^^D{x))A“ (x) : 
••(x) 



(25) 

(26) 

(27) 

(28) 
(29) 



Given ok (mu), we may consider any rule in Dm- However, given that Vy € 
N. in(y,mD) is true, we obtain that (14) and Vy G N. in{y,mD) T> c{y,mD) are 
inconsistent and thus imply any formula. Consequently, rules (26) to (29) are applicable 
and provide bl (too ) , yielding ko (too ) , which in turn blocks (9) for x = too • From (16), 
we obtain (among other relations) toi^2,3 C too, toi 2,4 C too, mi_ 3 ^ C too, 
and TO 2 , 3,4 C too- From (18), we then get ok(TOi^ 2 , 3 ), ok(TOi_ 2 , 4 ), ok(TOi_ 3 _ 4 ), and 

ok(TO2,3,4)- 

Now, consider ok(TOi_ 2 , 4 )- From (9), we obtain 



Vy G N. in{y, 1111^2,4) T> ok(y-TOi, 2 , 4 ) 



yielding ok(ni - 4711 ^ 2 , 4 ), ok(n 2 -TOi_ 2 , 4 ), and ok(n 4 -TOi_ 2 , 4 )- This allows us to apply three 
of the four rules in (23/24) and we obtain A(toi. 2 , 4 ) A ap(ni-TOi_ 2 , 4 ), 7 ?(toi_2 ,4) A 
ap(n2’TOi,2,4), and 79 (toi,2 ,4) A ap(n4-TOi,2,4)- From (20), we obtain ap(TOi_2,4), from 
which we deduce with (21) in turn ovr(TOi. 2 , 4 ), ovr(TO 2 , 4 ), . . . , ovr(TO 4 ), and ovr(TO 0 ). 

Next, consider ok(TOi_2,3)- As with ok(TOo), we obtain an inconsistency among 
in(ni, TO02.3), in{n2, ^1,2.3), in{ri3, toi, 2,3), Vy G N. in{y, 4771,2.3) T> c{y, TOi.2.3), 
and ( 14 ). This validates the prerequisites of Rule ( 26 ), ( 27 ), and ( 28 ), thus yielding 
bl(TOi.2,3). As above, we then get from Wm that ok(TOi.2), ok(TOi.3), ok(TO2,3). Note 
that we have already obtained ovr(TOi.2) from ap(TOi,2,4). 

Given ok(TOi. 3 ), (9) provides us with ok(47i-TOi,3) and ok(473-TOi,3). Using the two 
hrst rules in (23/24), we get A(toi. 3 ) A ap(47i-TOi.3) and -■ 73 ( 4771 . 3 ) A ap(773-TOi.3). 
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From ( 20 ), we then get ap(mi_3), from which we deduce with ( 21 ) in turn ovr(mi), 
ovr(m3), and ovr(m0) (again). 

Given ok(m2,3), along with the fact that in{n2,m2,3), in{n3,m2,3), Vy G 
N. in{y, m 2 , 3 ) 2) c{y, 7712,3), and ( 14 ) imply 5(7772,3) and -15(7772,3), Rule ( 27 ) and ( 28 ) 
fire and we get 01(7772,3). 

The next results show that our default theories resulting from S have appropriate 
properties. 

Theorem 4. Let E be a consistent extension of £{{D, W)) for normal default theory 
{D, W). V\fe have for all 5 £ D and for all Dm, Dm' Q D that: 

1. (777 Cl 777') G E iff->{m C 777') ^ E 

2. in{ns, m) € E iff ->in{ns, m) ^ E 

3. ok(777) G E ifovr{m) ^ E 

4. ok(m) G E if(ap(m) € E or 01(777) G E) 

5. ap\m) G E iffko{m) ^ E 

6 . ko{m) G E ijf(b\{m) G E orovr{m) G E) 

7. ovr(777) G E iffapim') G E and m m' G E for some m' G M. 

8 . Ifap{m) G E then 0I(t77') G E for all m' G M with m \Z m' G E. 

9. Ifap{m) G E then ovr(777') G E for all m' G M with m' \Z m G E. 

10. Ifap{m),ap{m') G E for then -1(777 C m') G E 



Theorem 5. If{D, W) is a normal default theory then £{{D, W)) has a unique exten- 
sion. 

The next two theorems show that our translation captures an encoding of extensions 
of a normal default theory. 

Theorem 6. Let (5, W) be a normal default theory and let E be the extension of 
£{{D,W)). 

Then for any ap(777) G E with m G M, we have that Th{{y \ 7(777) G 5}) is an 
extension of{D,W). 



Theorem 7. Let (5, W) be a normal default theory with extensions E \, ..., 5„ and E 
be the extension of£{{D,W)). 

Then, for any i G { 1 , . . . , 77}, there is some m G M naming GD(D, Ef) such that 
ap(777) G E. 

Lastly, our claim that a translated theory is “almost” a constant factor larger than 
the original requires elaboration. UNA^v yields a quadratic number of unique names 
assertions. In practice this is no problem, since any sensible implementation would not 
explicitly list such axioms. With the exception of unique names assertions, a translated 
theory is a constant factor larger than the original. To see this, it suffices fo examine 
Definition 2 . Each of ( 8 , 10 , 11 , 14 , 15 ) introduce \D\ axioms/rules; ( 13 ) introduces \ W\ 
axioms. All remaining terms introduce a single axiom. Moreover, the size of individual 
axioms is similarly bounded. (For example, each instance of ( 8 ) is a constant factor 
larger than the original default.) 
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5 Discussion 

We have shown how we can encode a normal default theory so that the extension from 
the encoding represents all extensions of the original theory. These results don’t rely on 
the normal form of the defaults, but rather on the fact that normal default theories are 
semi-mono tonic, that is on the fact that if E is an extension of {D, W), then there is an 
extension E' D E of {D D' ,W). The results of the previous sections then extend to 
any such theory. 

The fact that we encode all extensions of a theory within a single extension means 
that we can now encode phenomena of interest, usually dealt with at the metalevel, at 
the object level. Specifically we can now encode the notions of skeptical and credulous 
inference within a theory. In order to do this, we introduce two new constants skep and 
cred, for “skeptical” and “credulous” respectively. 

A formula is a skeptical inference if it is a member of every extension. In our approach, 
this means that it follows in every “ap-set”. Hence we define skeptical inference within 
a theory, for a given formula 7, by 

(Vx G M. ap(x) D 7 (x)) D ^{skep). 

For credulous inference there are a number of possibilities. The simplest is to assert that 
a formula is a credulous inference if it is a member of some extension: 

(3x G M. ap(x) A 7 (x)) D 'y(cred). 

With this definition, a formula and its negation may be credulous inferences. A stronger 
definition is to assert that a formula is a credulous inference if it is a member of some 
extension, and its negation is a member of no extension. We can define this notion of 
credulous inference (indicated by cred') for a formula 7 by means of the default: 

3x G M. ap(x) A 7 (x) : Vx G M. ap(x) D 7 (x) 

'y(cred') 

Hence in Example (22), we obtain that A is a sfceptical inference, while D is a cr ed'ulous 
inference. B and ->B are credulous inferences. 

We have suggested that the approach may be applicable in diagnosis programs, such 
as found in [10]. Similarly, the approach can be used to directly encode applications ex- 
pressible in Theorist [8]. That is, there is a correspondence between so-called Poole-type 
theories and Theorist with constraints [3] . Since Poole-type theories are semi-monotonic, 
this means that our approach can encode any application encodable in Theorist. 

Our approach relies on a first-order language. Despite this, the image of a theory over 
a finite language remains finite. As regards implementation, however, it is not advisable 
to use a bottom-up grounding approach, as done in many implementations of extended 
logic programming [4,7]. Instead, a query-oriented approach seems to be advantegous, 
because it may rely on unification rather than ground instantiation. 

In Definition 2, sets of defaults were ordered based on the partial order given by 
set containment. This order represents one example of a preference order on sets of 
defaults. A natural avenue for future work would be to generalise our approach to address 
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arbitrary preference orders on sets of defaults. In an arbitrary preference order on sets, 
one could represent desiderata as found in configuration, scheduling, or (generally) 
decision-theoretic problems. This could also be combined with the present approach 
yielding an encoding of preferences on extensions. Hence, for our diagnosis example, 
we might want to prefer extensions (diagnoses) on the basis of an ordering based on 
reliability of components. 

6 Conclusion 

We have described an approach for encoding default extensions within a single extension. 
Using constants and functions for naming, we can refer to default rules, sets of defaults, 
and instances of a rule in a set. Via these names we can, first, determine whether a set of 
defaults is its own set of generating defaults and, second, consider the application of sets 
of defaults ordered by set containment. The translated theory requires a modest increase 
in space: except for unique names axioms, only a contant-factor increase is needed. The 
translated theory is a (regular, Reiter) default theory. Hence we essentially axiomatise 
the notion of “extensions” for a class of default theories in a single extension. Further, 
we are able to prove that our translation behaves correctly. 

Using the approach we can now express notions such as skeptical and credulous 
inference within a theory. Arguably this will prove beneficial in expressing at the object 
level problems and approaches generally expressed at the metalevel. Areas of application 
range from specific areas such as diagnosis, fo broadly-applicable approaches such as 
Theorisf . Lastly, we suggest that the approach may be easily extended to address arbitrary 
preferences over sets of defaults. 
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Abstract. Conditionals {“if -then-rules”) are most important objects in 
knowledge representation, commonsense reasoning and belief revision. 
Due to their non-classical nature, however, they are not easily dealt with. 
This paper presents a new approach to conditionals, which is apt to cap- 
ture their dynamic power peculiarly well. We show how this approach 
can be applied to represent conditional knowledge inductively. In particu- 
lar, we generalize system-Z* as an appropriate counterpart to maximum 
entropy-representations in a semi-quantitative setting. 



1 Introduction 

Relationships amongst propositions are crucial pieces of knowledge. Usually, they 
express plausible connections, bring isolated facts together and help us obtain a 
coherent image of the world. Such relationships may be represented in a most 
general form by if-then- conditionals. Conditionals are omnipresent, in everyday 
life as well as in scientific environments. We make use of conditional knowledge 
when we avoid puddles on sidewalks (being aware of “If you step into a puddle, 
then your feet might get wet”), and when we expect high wheat prices from 
observing cold and rainy weather in spring and summer (due to “If the growing 
weather is poor then there will be an increase in price of wheat”). Conditionals 
represent generic knowledge, acquired inductively from experience or learned 
from books. They tie a flexible and highly interrelated network of connections 
along which reasoning is possible and which can be applied to different situations. 
Moreover, as plausible yet defeasible conclusions, conditionals are intimately 
related to nonmonotonic reasoning, or, in general, to uncertain reasoning. In 
belief revision, they take the role of revision policies, guiding changes of beliefs 
when new (propositional) evidence becomes apparent. So, in contrast to factual 
knowledge which is mostly static, conditionals bear a clearly dynamic flavor. 

The key to get conditionals right is to accept their non-classical nature - con- 
ditionals are not simply “true” or “false” . In a particular situation, a conditional 
is applicable (you actually step into a puddle) or not (you simply walk around), 
it can be found confirmed (you step into a puddle and indeed, your feet get wet) 
or violated (you step into a puddle, but your feet remain dry because you are 



S. Benferhat and P. Besnard (Eds.): ECSQARU 2001, LNAI 2143, pp. 604—615, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 




Handling Conditionals Adequately in Uncertain Reasoning 605 



wearing rain boots). So the central problem in dealing with conditional knowl- 
edge is to handle adequately, on the one hand, inactive (or neutral, respectively) 
behavior, and, on the other hand, active as well as polarizing behavior. 

The main object of this paper is 

— to sketch a new theory for conditionals which captures their three-valued, 
dynamic nature peculiarly well, 

— to show how it is realized in a semi-quantitative as well as in a probabilistic 
framework, 

— to apply this theory to the problem of representing conditional knowledge 
inductively in both frameworks. 

We will focus on the process of establishing conditional beliefs, considering con- 
ditionals as agents shifting possible worlds in order to establish relationships 
and beliefs. We will represent the effects (the learning of) conditionals have on 
worlds by conditional structures. Handling conditionals adequately then means 
to choose representations which are balanced with respect to the structures of 
the conditionals under consideration. In this way, highly complex interactions 
between different conditionals can be taken into account and maintained. 

A well-known probabilistic method that follows this idea is the principle 
of maximum entropy (ME-principle) [Par94]. In a semi-quantitative setting, 
system-Z* [GMP93] also realizes this approach, though it seems to be of re- 
stricted applicability. In this paper, we will generalize system-Z* to make it 
applicable for any consistent set of conditionals. The generalized system-Z*- 
approach will be shown to be theoretically justified, and will prove to handle 
even problematic examples adequately. Although this generalization of system- 
Z* is important in itself, it should be taken as only one example of how the con- 
ditional approach presented in this paper can be used. Due to separating strictly 
between structural and numerical aspects of conditionals, the new conditional 
theory can be applied in any (semi-)quantitative framework that allows the rep- 
resentation of conditional beliefs. Therefore, it has also important consequences 
for possibility theory, which has seen a lot of important work on conditionals in 
the last decade (see e.g. [BDP97],[BSS00]). This connection to possibility theory 
is elaborated in [KlOlb]. 

The following section summarizes fundamental facts about conditionals, ordi- 
nal conditional functions and probability distributions. Section 3 briefly sketches 
the ME-principle, as well as system-Z and system- Z*. Section 4 presents the 
generalized system- 2’* approach. The new dynamic approach to conditionals is 
presented in Section 5, and is linked to (semi-)quantitative representation of 
knowledge in Section 6. An outlook on further results concludes this paper in 
Section 7. 

2 Conditionals, Plausibility, and Probability 

We consider a propositional language C over a finite alphabet V = {a, 6, c. . .}. 
Let 17 be the complete set of interpretations of C, where each w G 17 is taken to 
be a possible world for C. To simplify notation, we will write A instead of -'A, 
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and AB instead of ^ A i?, for formulas A,BgC. Conditionals (B\A) represent 
statements of the form “// A then 5”, expressing a relationship between two 
(propositional) formulas A, the antecedent or premise, and B, the consequent. 
{C I C) denotes the set of all conditionals {B\A) with A,B G C. A world uj G f2 
is said to verify a conditional {B\A), if w |= AB-, it falsifies {B\A), if w |= AB; 
A u \= A, then {B\A) is not applicable to u. 

Epistemic states as representations of cognitive states of intelligent agents 
provide an adequate framework for conditionals. Two widely used types of 
epistemic states are probability distributions and ordinal conditional functions, 
OCF’s, which are based on the notion of plausibility. In short, OCF’s are func- 
tions K : C — >■ NU{0, 00 } from the set of worlds to the natural numbers, extended 
by 0 and 00 . They specify non-negative integers as degrees of plausibility - or, 
more precisely, as degrees of disbelief ~ for worlds. For propositional formulas 
A,B G £, we set k{A) = min{K(a;) | a; |= A}. A proposition A is believed 
iff k{A) > 0, which is denoted hy k \= A. This may also be specified by de- 
grees of plausibility (or of disbelief, respectively) by saying that k ^ A[n] iff 
k{A) > n {n G NU{0}). A conditional {B\A) G {C \ C) may be assigned a degree 
of plausibility via n{B\A) = k{AB) — k(A). k satisfies (B\A), k |= {B\A), iff 
k{AB) < k{AB), i.e. iff AB is more plausible than AB. We can also specify 
a numerical degree of plausibility of a conditional by defining k |= (B\A) [n] 
iff k{AB) + n < k{AB) (n G N U {0}). OCF’s are the qualitative counterpart 
of probability distributions (cf. [GMP93],[GP96]). For a probability distribution 
P, we have P{B\A) = for P{A) > 0, and P (= (B\A) [x] iff P{B\A) = x 

(xG[0,l]). 

We will consider mostly measure-free conditionals, focusing on structural 
aspects, but also allow quantifications. If TZ* = {(i?i|Ai) [a;i], . . . , (B„|A„) [x„]} 
is a set of (appropriately) quantified conditionals, then TZ = 
{(i?i|Ai), . . . , (i?„|A„)} denotes the set of unquantified conditionals, and 
vice versa. Throughout this paper, we will assume all OCF’s to be finite, and 
all probability distributions to be positive. All results to be presented also hold 
in the general case, but then need some technical modifications (see [KlOla]). 

3 Maximum Entropy and System-Z*^*) 

In this section, we review briefly well-known model-based approaches to represent 
conditional knowledge inductively. 

For a consistent set TZ* = {(Bi|Ai) [xi ], . . . , (B„|A„) [xn]} of probabilistic 
conditionals, the ME-representation of TZ* , ME{TZ*), is the unique distribution 
P that maximizes the entropy H{P) = — P{^) logP(w) subject to P \= TZ* 

(see e.g. [Par94],[KI98). ME{TZ*) can be written as 

ME{TZ*){u;) = ao H II (1) 

oj\=Ai^W^ 

with the afs being appropriately chosen so as to satisfy all conditionals in TZ*. 
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In a semi-quantitative framework, a well-known method to represent a (finite) 
set TZ = {ri = (Bi\Ai) | 1 < t < n} of conditionals by an OCF is to apply the 
system-Z of Goldszmidt and Pearl [GMP93],[GP96]. A conditional {B\A) is said 
to be tolerated by TZ iff there is a world lo such that w verifies (B\A) and oj does 
not falsify any of the conditionals in TZ. TZ is consistent iff there is an ordered 
partition TZq, TZi, . . . , TZk of TZ such that each conditional in TZm is tolerated by 
0 ^ m ^ k. The system-Z ranking function, , representing TZ is 

given by 

{ 0, if w does not falsify any n, 

1-1- max Z(ri), otherwise 

where Z(ri) = j iff G TZj. assigns to each world u> the lowest possible rank 
admissible with respect to the constraints in TZ. 

A more sophisticated representation is obtained by combining the system-Z 
approach with the probabilistic ME-principle, yielding system-Z* [GMP93] . The 
corresponding Z*-rankings of the conditionals in TZ have to satisfy (see [GMP93] ) 



Z*(r.)+ min_ ^ Z*(r,) 

and K* is then calculated by 

1 

w|=A^B^ 



1 -|- min 









(2) 

( 3 ) 



Z* -entailment |~ * is defined by TZ\^* {B\A) iff k*{AB) < k*{AB). System-Z* 
handles conditional knowledge quite appropriately (cf. [GMP93]). In particular, 
Z*-entailment satisfies the crucial property of irrelevance: If d is an atomic 
proposition not appearing in any of the conditionals in TZ, then 

TZ[^*{B\A) iff TZ[-*{B\Ad) (4) 

In [GMP93], a procedure is given to calculate Z*-rankings for so-called 

minimal- core sets, i.e. sets TZ of conditionals such that for each conditional r € TZ, 
there is a world oj € f2 that falsifies r and no other conditional in TZ. Bourne and 
Parsons [BP99] presented an algorithm that computes the Z*-ranking, when- 
ever (2) possesses a unique solution. This algorithm is also able to take variable 
strength of conditionals into account, that is, to calculate solutions to 

Z*(u)-k min_ ^ Z*{rj)=m+ min ^ Z*{rj), (5) 

uj\—AjBj uj\—AjBj 

so that K* 1= {B\A)[ni\. There are, however, sets of conditionals that do not 
specify unique solutions to (2), as the following example shows. 

Example 1. For the set TZ = {ri : (&|a),r 2 : (c|a),r 3 : (c|a&)}, (2) admits 
multiple solutions: (Z*(ri ), Z*(r9), Z*(r-j)) may be any one of, for instance, 
(1,0,1), (1,1,0), or even (2, -1,2) (cf. [BP99]). 
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Bourne and Parsons argue that some examples may be too complex to be dealt 
with by a “flat” system-Z* approach and advise to use variable strength con- 
ditionals to enforce unique solvability. This seems a bit strange - Example 1 
does not look very complex, and the probabilistic ME-method yields a unique 
solution even if all three conditionals are assigned the same probability. 

The problem with Goldszmidt & Pearl’s and Bourne & Parsons’ work is that, 
in order to use as much of the inferential power of the ME-method as possible, 
they cling too closely to probabilistic ME-techniques. The idea we will present 
and pursue here is to reveal the pattern that makes ME-methods a most adequate 
tool for handling conditional information and to transfer this pattern into the 
framework of semi-quantitative knowledge representation. This will bring forth 
a more general approach of which system-Z* turns out to be a special instance. 

4 Generalizing System-Z* 

System-Z* suggests a straightforward generalization for representing a set 7?. = 
{ri = (Bi\Ai) I 1 < t < n} of conditionals by an OCF: Use approach (3) and 
determine the Z*{ri) so that k* satisfies all of the conditionals r*, i.e. such that 

Z*{r,) > min ^ Z*{rj) - min_ ^ Z*{rj) (6) 

for 1 ^ f ^ n. That means that condition (2) is weakened to be satisfied as an 
inequality constraint. Variable strengths of OCF-conditionals can be taken easily 
into account by adding them on the right-hand side of (6), thus generalizing (5). 
This generalization, however, seems to be quite ad hoc, leaving the ME-track 
and thus without theoretical justification. 

Quite to the contrary - in the sequel, we will show that actually, this gen- 
eralized approach emerged from a formal principle of conditional indifference 
that was formalized in [KI98] as one of four axioms apt to characterize the 
ME-techniques. This principle is based on observing conditional structures and 
provides a powerful methodology, not only to represent conditional knowledge 
appropriately, but even to guide the revision of epistemic states by conditional 
beliefs. In the next section, we will briefly develop the necessary theoretical back- 
ground, which is purely algebraic and therefore can be applied in a probabilistic 
as well as in a semi-quantitative setting. Just as for the ME-methods, it will 
provide an intelligible and appealing scheme for the inductive representation of 
conditionals. 

5 A Dynamic Approach to Conditionals 

By observing the attitude of worlds with respect to it, each conditional {B\A) 
can be considered as a generalized (three-valued) indicator function on worlds: 

r 1 : uj'^ AB 

(B|4)(w)=<|0 : 

uj \= A 



u 



( 7 ) 
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where u stands for unknown or indeterminate (see, e.g., [Cal91]). Intuitively, 
representing or incorporating a conditional as a plausible conclusion in an epis- 
temic state means to make - at least some - worlds verifying the conditional 
more plausible than the worlds falsifying it. In this sense, conditionals to be 
learned have effects on possible worlds, shifting them appropriately to establish 
the intended relationship. Which worlds will actually be shifted depends on the 
chosen inductive representation procedure - for the conditional {B\A), all worlds 
in either of the partitioning sets AB^ AB and A are indistinguishable. 

When we consider (finite) sets of conditionals TZ — {{Bi\Ai), . . . , (i?„|A„)} C 
(£ I £), we have to modify the representation (7) appropriately to identify the 
effect of each conditional in TZ on worlds in 17. To this end, we replace the num- 
bers 0 and 1 by abstract symbols, a)*" , a“ , that we associate to each conditional 
{Bi\Ai) in TZ. Moreover, we will make use of a group structure to represent the 
joint impact of conditionals on worlds. 

So let Tn = (aj^,aj", . . . ,a+,a“) be the free abelian group with generators 
aj'‘,aC, . . . , a+,a“, i.e. T-n consists of all elements of the form (aj'')’'i(aj")®i . . . 
(a+)’’'*(a“)®" with integers ri,Si £ Z (the ring of integers). Each element of 
can be identified by its exponents, so that is isomorphic to Z^”. The 
commutativity of corresponds to the fact that the conditionals in TZ shall be 
effective simultaneously, without assuming any order of application. Note that, 
although we will speak of multiplication and products in the generators of 
IFtz are merely juxtaposed, like words. 

For each i, 1 ^ i ^ n, we define a function Cj = cr(^Bi\Ai) ■ ^ by setting 

fa+ if (B,|A,)(u;) = l‘ 

<Ti(w) := < a- if {Bi\Ai){uj) = 0 
[ 1 if {Bi\Ai){u;) = u 

CTi(uj) represents the manner in which the conditional (Bi\Ai) applies to the 
possible world w. The neutral element 1 of TF-jz represents the non-applicability 
of (Bi\Ai) in case that the antecedent Ai is not satisfied, so the neutral group 
element corresponds to a neutral attitude with respect to the conditional. The 
function an : fi Tn, 

o-7?,(w) := n = n n 

describes the all-over effect of 77. on w. an{^) is called the conditional structure 
of iv with respect to TZ. We will illustrate this notion of conditional structures in 
the following example which extends the well-known Nixon diamond: 

Example 2. Let TZ consist of the following conditionals: 
r\ : {p\q) Quakers are pacifists. r 4 : (6|a) Americans like baseball. 

r 2 : (p\r) Republicans are not pacifists, r^ : (b\q) Quakers do not like baseball. 
rs : (a\q) Quakers are Americans. 

The conditional structure of a possible world, say pqrab, is calculated in the 
following way: pqrab verifies the first conditional ri (pqrab \= pq), so we have 
ai(pqrab) = . pqrab, however, falsifies the second conditional r 2 = (p\r), thus 
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a 2 {pqrab) = a^. In the same way, a 2 ,{pqrab) = ^<Ji{pqrab) = sl^ , a^{pqrab) = 
a^. We obtain a-nipqrab) = a^a 2 ^a;^a 4 a^ . In the table below, we list the con- 
ditional structures of all possible worlds with respect to TZ\ 



LO 


an{(^) 


CO 


(T7?,(w) 


CO 


an(oj) 


CO 


an{(^) 


pqrab 


a+afa+aja^ 


pqrab 


afat 


pqrab 


afa+a+a^a^ 


pqrab 


at at 


pqrab 


a+afa+afa+ 


pqrab 


^2 ^4 


pqrab 


afa+a+afa+ 


pqrab 


a+af 


pqrab 


^3 % 


pqrab 




pqrab 


ai ajag ag 


pqrab 


at 


pqrab 


a+a2^a3^a+ 


pqrab 




pqrab 


afa+afa+ 


pqrab 


at 


pqrab 


a+a+a|a^ 


pqrab 




pqrab 


afa+a^a^ 


pqrab 


a4 


pqrab 


a+a+a4ia+ 


pqrab 


ar 


pqrab 


afa+afa+ 


pqrab 


af 


pqrab 


a+aga^ 


pqrab 


1 


pqrab 


ai ag ag 


pqrab 


1 


pqrab 


a+aga+ 


pqrab 


1 


pqrab 


ajjaga^ 


pqrab 


1 



Using this table, it is easy to see that TZ is consistent, with partition TZq = 
{^2U4} and TZ\ = {?"i,r 3 ,r 5 }. TZ, however, is not a minimal-core set, because 
each world falsifying ri (i.e. worlds with label a^ in the table) also falsifies at 
least one of r 3 ,r 4 ,r 5 . In fact, Goldszmidt & Pearl’s system- Z* approach fails 
for TZ. Bourne & Parsons’ algorithm, however, computes Z*{r\) = Z*{t2) = 
Z*{ri) = l,Z*{r^) = Z*{n)= 2 . 



a-n labels each world appropriately and makes conditional effects on worlds 
comparable and computable. For instance, from the table of Example 2 we see 
that pq rab and p q fab have the same conditional structure (namely ) , and that 

forming the quotient ^ ^ isolates the effect of the fourth 

(^nipqrab) ^ 2^4 ^4 

conditional. Making calculations of conditional structures more convenient and 
more elegant, we take the worlds w G 17 as formal generators of the free abelian 
group 17 := (w I w G 17). 17 consists of all products Q = with 

oji, , ojm G 17, and integers ri , . . . . Introducing such a “multiplication bet- 

ween worlds” is nothing but a technical means to comply with the multiplicative 
structure the effects of conditionals impose on worlds. As in Tti, multiplica- 
tion in 17 actually means juxtaposition. Now < 77 ^ may be extended to 17 in a 
straightforward manner by setting , 

yielding a homomorphism of groups an : 17 — >■ Tn- 

Having the same conditional structure defines an equivalence relation =n on 
17: LOi =n W 2 iff unY^i) = <y'R.{^ 2 )- Those elements of 17 that are balanced with 
respect to the effects of conditionals in TZ are contained in the kernel of an, 
ker an = {w G 17 | anYY = !}• ker an does not depend on the chosen repre- 
sentation of conditional structures by symbols in Tn and thus, it is an invariant 
of TZ [KlOla]. In a semi-quantitative as well as in a probabilistic environment, 
implicit normalizing constraints have to be taken into account, namely, k(T) = 0 
for OCF’s, and P(T) = 1 for probability distributions. This can be achieved by 
focusing on equivalence with respect to ct(t|t)- Since CT(t|t) simply counts the 
worlds occurring in w, two elements = ujY ■ . ■ wjjp, 0 J 2 = vY ■ ■ ■ G Y are 
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(T(T|T)-equivalent, =t ^2, iff Sfc- This means, wi =t W2 

iff they both are a (cancelled) product of the same number of generating worlds, 
each generator being counted with its corresponding exponent. 

The conditional structures of generalized worlds, aTz(D), can be considered as 
generalized, algebraic versions of the interaction quotients investigated by Good 
[Goo63]. CT7?,(a5) = 1 (i.e. u) € ker an) indicates that the interactions between the 
conditionals in TZ are balanced in oj. We will elaborate the consequences of this 
idea for representing conditionals adequately in the next section. 

6 Conditional Indifference 

To study conditional interactions, we now focus on the behavior of OGF’s 
K : f? — >• N U {0,oo} and probability functions P : 17 — >■ [0, 1] with respect 
to the elements in 17. We extend each such function to a homomorphism, 
K : f? — >■ (Z, +), or P : 17 — >• R+, respectively, by setting • . . . • = 

riK(wi) + . . . + rmK{uJm) and P(wi’’i • . . . • = P(wi)’'i • . . . • P(wm)’'’”- 

This allows us to analyze numerical relationships holding between different k{lo), 
respectively P(w), and to elaborate the conditionals whose structures these fun- 
ctions follow. In the sequel, let V denote an OGF or a probability function, i.e. 
V = K or V = P. 

Definition 1. Assume P C (£ | £) to he a set of conditionals. V is indifferent 
with respect to TZ iff V{uji) = V{bJ 2 ) whenever an{l^i) = cfn{^ 2 ) for u)i =y 
u >2 G 17, i.e. iff ker an A ker a(j-\-Y) G kerV. 

Note that we presupposed all OGF’s to be finite and all probability functions to 
be positive in order to make this definition a very concise one. For the general 
case, cf. [KlOla]. 

V being indifferent with respect to 7Z means that it does not distinguish 
between different elements wi , D 2 with the same conditional structure with re- 
spect to 7Z. Normalizing constraints are taken into account by observing =y- 
equivalence. Gonversely, any deviation k(cu) yf 0, respectively P(w) yf 1, can be 
explained by the conditionals in 7Z acting on tD in a non-balanced way. Gon- 
ditional indifference captures interactions of conditionals of arbitrary depth by 
making use of the group homomorphism induced by V. The next theorem gives 
a simple criterion to check conditional indifference (for a proof, cf. [KlOla]): 

Theorem 1. Let TZ = {(PijAi), . . . , (P„|A„)} he a set of conditionals. 

A (finite ) OCF k is indifferent with respect to TZ iff there are rational numbers 
kq, nf , K~ G Q, 1 ^ i ^ n, such that for all w G 17, 

k{uj) = ko+ 

A (positive) prohahility function P is indifferent with respect to TZ iff there 
are positive real numbers ao, a(( , af,..., af, af G M"*" such that for all oj G fT, 

p{u ) = «o n n 

uj\=AiBi uj\=AiBl 



( 10 ) 
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Note that conditional indifference is a measure-free notion. In particular, it does 
not require the conditionals in TZ to be represented by n or P, respectively. This 
qualitative or quantitative information is taken into account in a second step: 

Definition 2. Ordinal conditional functions, k, and probability functions, P, 
representing a set TZ of (quantified) conditionals and being indifferent with respect 
to it, are called c-representations. 

Theorem 1 provides an intelligible schema to construct c-representations. For 
instance, for an OCF-c-representation, we simply set up k according to (9) and 
choose the nf, k~ appropriately to ensure that k ^ TZ'^*'> holds, i.e. such that 

>{ui+) min k~) (11) 

‘^l=AjBj 

where the optional addition of Ui on the right-hand side allows for variable 
strength conditionals. Furthermore, the normalizing constant kq has to be chosen 
appropriately to ensure that actually an OCF is obtained. Once suitable numbers 
, K~ are fixed, k{uj) can easily be computed from the conditional structure 
(Jnfjj) of u) by replacing each symbol a)*", a~ by its numerical counterpart k)) , k~ 
(note the similarity of (8) with (9) and (10)). 

By setting k)) := 0 for each conditional ri G TZ, and determining 0 

according to (11), we recover the generalized system-Z* (see (6)): 

mii^ / ^7 (12) 

I * • 

Note that, due to the consistency of TZ, also kq = 0 in this case. By choosing k~ ^ 
0 minimally, we obtain system-Z*. As long as we do not have multiple minimal 
solutions K ~ , the algorithm of Bourne & Parsons can be used to calculate these 
numbers. We also recover the ME-distribution from (10) (see (1), with af = 
and a~ = a~^‘ for 1 ^ t ^ n). So both ME-methods and system-Z* obey 
the principle of conditional indifference. 

In [BSSOO], infinitesimal Icd-plausibility functions are introduced which are 
based (twofold) on the principle of least commitment (Ic) and on the principle of 
auto-deduction (d) which simply states that the plausibility function should sa- 
tisfy the conditionals in TZ. The specific multiplicative structure of Icd-functions 
is motivated by using Dempster’s rule of combination. These Icd-functions can 
be constructed directly from our approach: Instead of proto-infinitesimals in- 
troduced in [BSSOO] to represent infinitesimals symbolically, we use the formal 
symbols a]” , . . . , a“ which can be assigned infinitesimals (or non-infinitesimal 
positive real numbers, in a non-infinitesimal framework) to give rise to plausi- 
bility functions of led- type. The specific form of Icd-functions is found to match 
the conditional structures of worlds - Icd-functions are indifferent with respect 
to TZ. For a more detailed comparison in the framework of possibility theory, cf. 
[KlOlbj. 
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Example 3. We continue Example 2, the extended Nixon diamond. (12) yields 
K4 > 0. Therefore, we set := K4 := 1. For determining k)", Kg , the ta- 
ble from Example 2 proves to be helpful. We calculate k(" > min{Kj^, K4 ,k^} — 
min{Kj^, K4 , K^} = 0, Kg > min{K4 , K^}, K^ > min{K4,K^}. So we set 
k(” := 1, := K^ := 2. Because these numbers are minimal, they can also be ob- 

tained by applying the Bourne & Parsons-algorithm. Now, Kc(w) = X) 

can be computed easily. The table below shows Kc-rankings for some worlds uj 
and compares them to system- Z-rankings. 



OJ 


Kc(oj) 


nffuj) 


tu 


Kc(lo) 


K^{uj) 


CO 


Kc(w) 


K^(w) 


CO 


Kc(w) 


K^{u>) 


pqrab 


3 


2 


pqrab 


5 


2 


pqrab 


3 


2 


pqrab 


5 


2 


pqrab 


2 


1 


pqrab 


3 


2 


pqrab 


2 


2 


pqrab 


3 


2 



It is clearly seen that a c-representation imposes a more finely grained structure 
on the possible worlds and thus establishes more conditionals. For instance, we 
have K^ ^ (b\pqra), but Kc |= (b\pqra), that is, a republican, who is a pacifist and 
a quaker, but not an American, is supposed not to like baseball. To show that 
this is more than pure speculation, compare the conditional structures of the two 
worlds involved, pqrab and pqrab: They are the same, except for conditional rs, 
and due to rg, the person is supposed not to like baseball. 

In general, system-Z appears to be too cautious to handle conditional inter- 
actions adequately. But the extended Nixon diamond also shows it to be too 
bold sometimes: Now we have k^ |= (p\qr), which seems to be quite unintuitive. 
This problem also holds for other well-known approaches to conditional reaso- 
ning, cf. [DP96] . Again, the c-representation Kc proves to be more adequate: We 
have Kc{pqr) = 2 = Kc{pqr), so Kc is completely indeterminate with respect to 
{p\qr) - it preserves ambiguity (cf. [BSSOO]). 

In contrast to the limited applicability of system-Z*, an OCF-c-representation 
can be calculated for any consistent set of conditionals: 

Proposition 1. TZ Q {C \ C) is consistent iff a c-representation ofTZ exists. 

It is already the approach (9) (or (3)) that guarantees a well-behavedness with 
respect to conditional interactions. In particular, all OCF-c-representations sa- 
tisfy the property of irrelevance in the following sense (cf. (4)): 

Lemma 1. Suppose TZ is a set of conditionals, and d is an atomic proposi- 
tion not appearing in any of the conditionals in TZ. Then, for any OCF-c- 
representation Kc ofTZ, and for any conditional {B\A) € {C \ L), Kc ^ {B\A) iff 
Kc \= {B\Ad). 

An analogous quantified version of this lemma holds for probability functions, 
too. Furthermore, let be the (nonmonotonic) inference relation based on 
considering all c-representations: 

TZ\^‘^{B\A) iff Kc ^ (i?|^) for all c-representations of 7^ (13) 

where TZQ {C\ £), {B\A) G (£ | £). Then Lemma 1 yields 
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Corollary 1. Let TZ Q {L \ £), {B\A) € {L \ L), let d he an atomic proposition 
not appearing in any of the conditionals in TZ. Then TZ ^{B\A) iffTZ '^{B\Ad). 

Moreover, it is easy to prove that also fulfills the basic inferential properties 
of system P (see, e.g., [BDP97]). But it does not suffer from the drowning pro- 
blem, or, more specifically, from the blocking of property inheritance problem 
(see, e.g., [BDP97]), as the following example shows: 

Example 4- Let TZ consist of the rules (f\b) {birds fly), {b\p) {penguins are 
birds), (/Ip) {penguins do not fly) and {w\b) {birds have wings). We will show 
TZ\^'^{w\pb), that is, penguins inherit the property of having wings from their 
superclass birds, although they are non-typical birds. 

Let Kc be a c-representation of TZ, with nf , nf , K 2 , nf , k'^ , nf , k'^ , nf spe- 
cifying the numerical effects of the conditionals, and with normalizing constant 
Kq. For the fourth conditional (w| 6 ), (11) yields nf — k'^ >0 (this can be seen 
quickly by listing the conditional structures of all worlds involved), so that Kf > 
K 4 . By checking the conditional structures of the worlds verifying, or falsifying, 
respectively, {w\pb) we find that nflpbw) = min{«:o+'^i' + + , ^0 + '^/ + 

and nflpbw) = min{ACo+'^i" -\-Kf , 

Due to nf > n'j) , we conclude nflpbw) < Kflpbw), that is, Kc ^ {w\pb). 

Suppose now that we also take the conditionals (e| 6 ) {birds lay eggs), {w\k) 
{kiwis do not have wings) and {h\k) {kiwis are birds) into account.^ Then it can be 
shown that any c-representation Kc of this extended set of rules treats both pen- 
guins and kiwis as normal birds with respect to laying eggs: Kc |= {e\pb), {e\kb). 
If we further restrict our attention to c-representations with nf = Q (that is, 
the extended system-Z* approach), then moreover, any such c-representation Kc 
will also satisfy {w\ph) and {w\kb). This shows, how most complicated interde- 
pendencies between conditionals are treated properly in our framework. 

We described this example in detail to show that inference based on c-representa- 
tions is largely qualitative and non- numerical. In fact, our argumentation used 
mainly structural information, provided by conditional structures. 



7 Outlook 

In this paper, we presented a new theory to handle conditionals most adequa- 
tely in inductive knowledge representation and uncertain reasoning. This new 
theory is based on the algebraic notion of conditional structures of worlds. Its 
realization via group theory does not only provide an elegant methodological 
framework for representing conditional knowledge. The same machinery can be 
applied to guide the revision of epistemic states by sets of conditionals in a way 
that is compatible with the postulates for iterative belief revision of [DP97] , and 
with their generalizations presented in [KI99] (cf. [KlOlJa). Moreover, the new 
conditional theory sketched here can also be used for knowledge discovery (see 
[KI00],[KI01a]). It is there where the techniques introduced in this paper reveal 

^ I am grateful to an anonymous referee to raise this problem. 




Handling Conditionals Adequately in Uncertain Reasoning 615 



their full power by elaborating conditional structures from numerical relations- 
hips by group theoretical means. 

So indeed, the conditional theory presented in this paper has important ap- 
plications in the whole area of formal knowledge management - uncertain rea- 
soning, belief revision and knowledge discovery. A comparison with conditional 
event theories can be found in [KlOlb]. 
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Abstract. We introduce a ranking construction semantics for graded 
defaults, formulate the principles of the minimal construction philoso- 
phy, from which we derive a preference order over ranking constructions. 
It defines a powerful rational default inference notion, JLX-entailment, 
which we are going to compare with system JZ. 



1 Introduction 

Since the beginning eighties, hundreds of formalisms have tried to capture our 
basic intuitions about default reasoning. However, a real consensus - e.g. com- 
parable to the broad acceptance of the standard probabilistic framework - has 
not yet been reached, not even for particular application areas. The actual domi- 
nance of specific systems often merely reflects historical peculiarities rather than 
genuine superiority or the presence of stronger justifications. Nevertheless, in- 
dependently from popularity considerations, some directions seem to be more 
promising than others. Of particular interest are the semantic-based conditional 
accounts, which may be seen as a qualitative counterpart to probabilistic reason- 
ing. These approaches are theoretically and conceptually appealing because they 
provide a transparent model-theoretic semantics for default conditionals (defin- 
ing the monotonic logic), as well as reasonable semantic-based default inference 
notions (defining the nonmonotonic logic). 

The simplest and most prominent conditional default formalisms of accept- 
able strength are system Z [Pearl 90] and rational closure [Lehmann and Magi- 
dor 92], which are based on the normality maximization paradigm [Weydert 96]. 
They interpret defaults as constraints on plausibility rankings and prefer those 
rankings making more worlds more plausible. This procedure is quite successful, 
but it still fails to validate many desirable inheritance features. There have been 
several attempts to overcome these problems, e.g. [Geffner, Pearl 92, Lehmann 
95], but most of them rely on more or less arbitrary - and probabilistically ques- 
tionable - prioritization procedures, which sometimes derive their justification 
mainly from the handling of specific examples. 

An important exception is default reasoning based on entropy maximization 
(ME-entailment) [Goldszmidt et al. 93]. Unfortunately, the associated ranking- 
based procedure only works for a very restricted class of suitably non-redundant 
default sets. There has been some progress, but no general solution [Bourne, 
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Parsons 99, Kern-Isberner 01, Weydert 95,98]. Another exception is system JZ 
[Weydert 98], a powerful default inference relation which is based on a natu- 
ral canonical ranking construction algorithm trying to implement the idea of 
information minimization directly within the ranking framework. It stands in 
the tradition of system Z but coincides with ME-entailment for a broad class of 
examples. In particular, system JZ defines a rational consequence relation and 
allows inheritance to exceptional subclasses. However, the corresponding mini- 
mal construction procedure, although well-motivated, is a bit cumbersome and 
opaque. In fact, a simple, purely semantic interpretation is still missing. 

In this paper, we are going to take the direct preferential semantic road. The 
idea is to define a global preference order over ranking constructions, intended to 
implement the minimal construction and the normality maximization philosophy 
in a more reasonable and transparent way. The resulting JLX-ordering is moti- 
vated by three basic principles - minimizing evidence, minimizing surprise, and 
minimizing shifting. It guarantees the existence of a canonical minimal model 
for each consistent finite default base, which gives us a powerful rational default 
inference notion for graded default knowledge, JLX-entailment. Even more in- 
teresting, if we combine JLX-minimization with justifiable constructibility, the 
fourth minimal construction requirement, we get exactly system JZ. 

The paper is built up as follows. First, we introduce the ranking construction 
semantics for defaults, as well as relevant transformation and description func- 
tions. After recalling our general semantic-based default entailment philosophy, 
we present our four minimal construction principles and use the first three to 
define a preference relation over ranking constructions, the JLX-order. It deter- 
mines a rational default inference relation, JLX-entailment, which we illustrate 
with several examples. To conclude, we consider justifiable constructibility, the 
fourth minimal construction requirement, and investigate the relationship be- 
tween JLX-entailment and system JZ. 

2 Ranking Constructions 

Our semantic framework is based on ranking measures, coarse-grained quasi- 
probabilistic valuations measuring the degree of disbelief, implausibility or sur- 
prise of propositions. More precisely, we consider standard K7r-measures, which 
generalize Spohn’s discrete- valued ^-functions [Spohn 90] and are formally equiv- 
alent to real-valued possibility measures with multiplicative conditionalization 
[Dubois, Trade 88] (but without assuming a fuzzy-theoretic interpretation). Of 
particular interest is the probabilistic reading of K7r-measures. In fact, we may 
interpret each rank R{A) = a as the order of magnitude of an infinitesimal 
probability P{A) = (for an arbitrary but fixed infinitesimal £ > 0). This rela- 
tionship allows us to exploit major tools and concepts from classical probability 
theory, like belief networks and entropy maximization. 

Definition 21 (Standard K7r-measures) 

Let B C 2^ be a eompaet boolean algebra of propositions and [0,oo] be the set of 
positive reals and infinity, i? : — >■ [0, oo] is a standard Kir-measure iff 
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• 1. i?(W) = 0, 2.R{%) = oo, 3. R{A\JB) =xmxi{R{A),R{B)}. 

The conditional mr-measure is defined by R{B\A) = R{B fl A) — R{A) for 

R{A) oo, else R{B\A) = oo. i?o is the uniform nir-measure, with Rq{A) = 0 
for A=fi If R satisfies only 2 and 3, we call it a kt: - pseudo-measure. 

The domain B usually consists of the model/ world sets over some classical com- 
pact background logic (£, ^). In what follows, we assume that B and (£, 
have been fixed. In this context, let Modn-K be the set of all K7r-measures over B. 
For convenience, we use the sentences A G C also to denote the corresponding 
model sets Mod{A) G B, e.g. abbreviating R{Mod{A)) by R{A). 

Following Spohn [Spohn 88], ranking measures - similar to subjective proba- 
bility measures - can be used to model epistemic states. The idea is to identify the 
belief strength Bel{A) with the degree of surprise of ->A, namely That 

is, under the most liberal interpretation, A is believed iff Bel{A) = R{-'A) > 0. 
It is believed with strength s (at least) iff > s. This definition has the 

advantage that it supports full belief, i.e. belief closed under conjunction (and 
logical entailment). 

We consider two basic transformations for /t7r-pseudo-measures, shifting and 
normalization. Given a K7r-pseudo-measure S, shifting a proposition A by the 
amount a means uniformly increasing the ranks of yl-worlds - more precisely, of 
A-subpropositions - by a. Normalization means uniformly downwards shifting 
until the K7r-pseudo-measure becomes a K7r-measure. 

Definition 22 (Shifting, normalization) 

Shifting is a ternary function -I- : (S', a, ^) i— >■ S' -I- aA (also written S[^ -I- a]) 
defined on ktt - pseudo-measures S, A G B, and a € [0,oo], such that for B G B, 

• S -\- aA is a ktt -pseudo-measure, 

• {S + aA){B\A) = S{B\A) and {S + aA){B\^A) = S{B\^A) , 

• (S -I- oA)(A) = S(A) -I- a and (S -I- a^)(-'^) = S(-'A). 

Normalization is a unary function mapping a ktt - pseudo-measure S to the mr- 
measure jSj defined by |S|(A) = S{A) — S(W) (oo — oo = 0). 

With shifting and normalization, we can describe a simple and natural revision 
concept for K7r-measures which uses Jeffrey-conditionalization and goes back to 
Spohn. For consistent evidence (represented by) A G B and a G [0, oo], the 
parametrized Spohn-type revision procedure enforces a belief strength of at 
least a by shifting -^A as far as necessary, followed by a normalization step. 

Definition 23 (Spohn-type revision) 

We define : Mod^-K x — >• with {R, A) >->• A s.t. 

• R{-'A) > a or R{A) = oo : A = R. 

• R{^A) < a and R{A) < oo : R A= ji? -I- (a — R{~^A) -\- i?(A))-iAj . 

Given a prior K7r-measure R and a collection X G1 B oi possible consistent evi- 
dential inputs, we are interested in those K7r-measures which can be reached by 
iterated revision with Ai G X starting at R. We present two perspectives. 
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Definition 24 (Epistemic accessibility, constructibility) 

R' is epistemically accessible from R over I iff for some Oi G [0,oo], G I, 

• R' = R Aq • *sp ■ 

R' is epistemically constructible from R over S iff for some Oi G [0,oo], Ai G S, 

• R' = \R UqAq + . . . + UnAn \ ■ 

R*spA/R + A is the class of ktt - measures epistemically accessible/constructible 
from R over A. 

Obviously, R' is epistemically accessible from R over X iff R' is epistemically 
constructible from R over ~<I = {-•A | A G I}, i.e. R -kgp X = R-\- -XX. 

3 Default Semantics 

On top of L, we consider a graded default conditional language using 

positive rational strength-parameters s (with ~ =i>). 

• £(=^) = {A I Gl' G £, 0 < s G Rat U {oo}}. 

The standard K7r-semantics for £(=f>) is based on the satisfaction relation \=ktt- 

• R^^^A^‘^A' iff R{AaA') + s<R{AA^A'). 

Monotonic entailment is defined as usual. For each A C £(^), let Mod^T^{A) 
= {i? G MocIk,tt I R \=K-K A} be the set of K7r-models of A. Unfortunately, the 
usual ranking semantics for default conditionals conflicts, either with desirable 
inheritance features, or with basic nonmonotonic inference patterns. In fact, we 
know that any default formalism validating for all logically independent tp, ip, 

• Exceptional inheritance: U {T ^ (p,T ^ ip} ip, 

cannot be invariant under the substitution of ACTr-semantically equivalent de- 
fault sets. Otherwise, equivalent consistent premises could produce conflicting 
conclusions - consider e.g. {-i(^}U{r ^ ip,T ^ {ip irA ip)} |~ -'ip - which is inac- 
ceptable. In other words, the standard K7r-semantics is not fine-grained enough 
to capture relevant independency information implicitly encoded in the choice of 
defaults and affecting our intuitions about admissible default conclusions. There- 
fore, we may want to extend our default semantics so as to sensibilize it for this 
sort of structural information. More concretely, we are going to introduce refined 
semantic entities exploiting the epistemic construction perspective. The idea is 
to consider not the K7r-measures, but the collections of shifting steps, i.e. the 
KTT-constructions, summarizing their update history. This allows a closer match 
of the default knowledge structure. 

Definition 31 (K7r-construction semantics) 

A KTT -construction a is a sequence {ai,Ai \ i < n), written ooGlo -I- ... -I- a„An, 
where Ai G B, at G [0, oo], and R„ = i?o + agAo -|- . . . -I- a„An is a proper ktt- 
measure. Let = {Ai \ i < n}. The KTT-\--semantics for ktt - constructions over 
£(=^>) is given by the satisfaction relation 
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• a \=K-K+ A =^^A' ijf Ra \=KTT A =^^A' and Mod{Af\ ~^A') G S^- 
As before, we set a |=„^ A A' iff R,j A A' . 

This immediately neutralizes the exceptional inheritance paradox. An important 
fact is the equivalence of ktt- and /t7r+-consistency. Let A = {Ai | t < n} 

and Aff^^ = {Mod{A — >• A') \ A =^^A' G A}. If A has a K7r-measure model, we 
may find a K7r+-model of A of the form cr = Oo(Ao A -^A'f) + . . . + an{An A 

Theorem 32 (Model accessibility) 

Mod^,^{A) iff Modf,^{A) n (i?o ~^sp ^rnod iff Mod^n+iA) yf 0. 

Our specification of default entailment relations within the /t7r+-framework uses 
several auxiliary functions to evaluate the characteristics of K7r-constructions, 
e.g. the amount of surprise or the shifting efforts associated with cr = Ei<naiAi. 

Definition 33 (Active rank) 

The active rank function r„ : B ^ [0, oo] for a is given by 

• Co- (A) = SUp{{s > 0 I A C U{Ay I Ra-(Aj) > S, Uj > 0}}). 

The active rank ro-(A) of a proposition A is the maximal rank s so that A is 
covered by actively shifted propositions Aj of rank at least s. Obviously, we 
have ra{A) < Ra{A). For instance, if A,B,P G B are logically independent and 
cr = lA + IB + 0(A ABA P), then Vc^{A ABAP) = 1<2 = Rcr{A ABA P). 

Definition 34 (Cumulative surprise) 

The cumulative surprise functions surer, surf : [0,oo] — >■ 2® for a are 

• suro-(s) = {Ai \i<n,s< rer(Ai)}. 

• surf(s) = {Ai\i<n,s<ra(Ai)}. 

sur^is) / surf (s) is the set of all those shiftable propositions getting at least ac- 
tive rank s/active rank higher than s. For instance, if cr = 1A-I-2B-I-0P, the weak 
cumulative surprise function for a is characterized by st6ro-(0) = So-, suro-(l) = 
SMr"(]0,l]) = {A, B}, surcr(2) = sur"]l,2] = {B}, suro-(oo) = sur"]2, oo] = 0. 
If cr = I A -I- 2(A V B), the relevant sets and values are swr£,(2) = {A, A V B} 
and surcr{3) = {A}. Obviously, it is enough to know sura-{s) or surf{s) for 
s G {Rcr(A) I A G B}. Whereas surcr,surf indicate the active surprise structure 
of CT, the binary function sha- describes the fine-grained local shifting structure. 

Definition 35 (Shifting effort) 

The shifting effort sha : [0, oo]^ — >■ 2® for a at rank s and shifting length h is 

• sha{s, h) = {Aj I z < n, ra{Ai) = s,ai = h}. 

sha{s, h) collects those effectively shifted propositions of active rank s which are 
shifted by the amount h. For instance, if cr = lA -|- IB -|- 2(A V B), we have 
sha{f, x) = sha{2, x) = 0, sha{i, 1) = {A, B}, and sha{i, 2) = {A V B}. 
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4 Default Entailment Philosophy 

Default reasoning in the context of a plausibility semantics \=pi for default con- 
ditionals usually takes the following form. For a fact base if C £, a default base 
A C and a potential conclusion ip G £, we first collect the corresponding 

plausibility models - e.g. plausibility orders, K7r-measures or K7r-constructions - 
of A in Modpi{A). Using a suitable choice criterion - e.g. minimizing surprise or 
construction efforts - we determine the preferred plausibility models of A and 
put them into Pref{A) C Modpi{A). Then we accept ip as a, default conclusion 
of E and A iff in each distinguished model of A, ip is sufficiently plausible given 
E. More precisely, if E = . . . , v?p} and (ps = A . . . A Pp, there should 

be a conditional ps V’ valid in each preferred plausibility model. Within 
the KTT-framework, this amounts to require R{-<ip\ps) > 0. Being the weakest 
possible condition, this maximizes the set of defeasible conclusions. The default 
entailment notion is thus fixed by the plausibility semantics \=pi, the preferred 
model function Pref, and the sufficient plausibility requirement. 

• EuA[^P^^fiP iff iff ^S>0 Pref{A)\=pips^^ip. 

From this definition, the preferential nature of plausibility models, and the clo- 
sure of preferential conditional theories under intersections, it automatically fol- 
lows that is a preferential consequence relation [Kraus et al. 90]. Con- 

sequently, we may describe it by a pre-order over /1-worlds, which of course 
depends on A. A different question is whether Pref itself is preferential in the 
sense that it results from a pre-order ^ over plausibility models, i.e. whether 
Pref{A) = Min^{Modpi{A)). This would give us a preferential consequence 
relation over /1(=^>) extending \-pi. 

Of particular interest are of course those powerful approaches singling out a 
canonical preferred ranking model, e.g. like rational closure/system Z [Lehmann, 
Magidor 92, Pearl 90], or lexicographic closure [Lehmann 95]. For each consis- 
tent, finite A, we then obtain a rational consequence relation System JZ 
[Weydert 98], a well-behaved default inference notion anchored in the epistemic 
construction framework, also belongs to this category. It is based on the minimal 
effort construction of a canonical K7r-measure model of A. However, system JZ 
encodes the minimal construction philosophy through a somewhat more opaque 
- although well-motivated - algorithm. We would certainly prefer a more flexi- 
ble, semantic-oriented approach, based for instance on the explicit comparison 
of construction efforts. Accordingly, we are going to look for preferred model 
functions resulting from suitable preference orderings ^ over K7r-constructions. 

• Pref{A) = Min^{ModK-„+{A)). 

The idea is to prefer those K7r-constructions with the highest inherent plausi- 
bility and requiring the lowest construction efforts. To determine appropriate 
preference relations we are going to exploit four intuitive informal guidelines 
which reflect different faces of this minimal construction philosophy. The first 
principle restricts the collection of shiftable propositions, the second one max- 
imizes the plausibility of Ra- lexicographically, the third one minimizes at each 
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level the local shifting efforts, and the fourth one attacks shifting redundancy. 
So, let {R{A[) + Si < R{Ai) | i < n} be the set of constraints resulting from a 
finite default base A and cr be a K7r-construction. In the examples, we assume 
that the propositions A,A',B are logically independent. 

PI. Minimizing evidence. 

The set of shiftable propositions So- should he minimized, i.e. restricted to the 
exceptional areas from the defaults given in A (PA ~<P' € Sa iff P 

This is a simple structural parsimony principle. For instance, a\ = 2A is prefer- 
able to (72 = 1.4-1-05, although Ra^{A) < i?cri(.4). Furthermore, a\ = l-'A+l^B 
is a better model oi A = {T ^ A, T ^ B} than (72 = 0.5-'24-|-0.5-'5-|-0.5(-'AA 
B) + 0.5(AI a ~'B). In fact, without this requirement, our approach would be 
indistinguishable from pure normality maximization, i.e. mishandle exceptional 
inheritance. 

P2. Minimizing surprise. 

KTT- constructions making more propositions less surprising should he preferable. 
This means minimizing the set of shiftable - or at least effectively shifted 
- propositions Ai arriving at any given rank s, i.e. verifying Ra{Ai) > s. Be- 
cause increasing the degree of surprise of more plausible propositions has a higher 
informational impact than doing so for less plausible ones, we should start min- 
imization at the bottom and proceed lexicographically. 

For instance, we should prefer cti = 1(.4 V 5) -|- OA -\- 2{A A B) to (72 = 
l{Ay B)-\-lA-\- D{ A A B) because sur„.^{2) = {A A B} C. {A,AAB} = sur„.^{2). 
This principle strengthens and adapts the classical normality maximization phi- 
losophy to the epistemic construction framework. 

P3. Minimizing shifting. 

The length of shifting moves aimed at pushing propositions to a given rank should 
he minimized, starting with the longest, most costly ones, before proceeding lexi- 
cographically towards the shorter ones. 

For instance, we should prefer a± = \A-\-\B-\-\{A\/ B) to (72 = 2A-\-2B-\-0{A\/ B). 
Although i?cr, (A) = (B) = (A V 5) = 2, CT 2 should be rejected because it 

uses longer shifting moves (of length 2 > 1) to reach rank 2. This requirement 
reflects the minimal effort philosophy locally. It takes into account that the eval- 
uation of efforts is best done relative to a specific task in a specific context, here 
to build up a particular ranking level. It supports uniqueness, which would fail 
if we maximized the shorter moves. 

P4. Justifiable constructibility (w.r.t. A). 

There should he no unjustified, redundant shifting moves w.r.t. the satisfaction 
of the ranking constraints from A. That is, a shifting of Ai may only occur - in 
other words 0 < - if there is no oversatisfaction, i.e. if we do not only have 

Ra{A'f) -\- Si < Ra-{Ai), but even Ra-{A'f) -\- Si = Ra-{Ai). Note that justifiable 
constructibility has to be evaluated w.r.t. a specific A. 
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For instance, if Z\ = {T A,T A A A'}, we get the constraints 
> 1, R{-^A V ^A') > 2}. Then cti = 2{->A V ^A') + O-'Al is preferred to 
(72 = 2(->A V -'Al') + 1-iA. In fact, because Ra^ {A A A') + 2 = R„^ (A A A') + 2 = 
2 = R^-^l^—'A V ~'A'^ = i?CT 2 (“'Al V “'^^), whereas i?cn(Al-) + 1 = 1 < 2 = R^^l^—'A) 
and R„^{A) + 1 = 1 < 3 = R„^{-^A). That is, the shifting of -'A is redundant 
and only ai is justifiably const met ible. 

5 JLX-Entailment 

In this section, we exploit the first three principles to build a K7r-construction 
ordering -<jix- Starting with So- (PI), the general idea is to proceed lexicograph- 
ically and bottom-up, minimizing at any given rank s, first the set of shiftable 
propositions arriving at s, secondly the local shifting efforts directed at reach- 
ing s. More precisely, when comparing two K7r-constructions cti, (T 2 , we look 
for the first rank s where the shifting moves diverge and pick up the one with 
the smaller cumulative surprise set sur^^(s) (P2). If the sura-.(s) don’t differ, 
we concentrate on the common active rank s part, i.e. sur„^{s) — surj*" 2 , where 
sur ^2 = sur+^(s). In this context, we choose the at with the set- 

theoretically smallest set of shifts s/i+ (s,/i) = shai{s,h) — sur ^2 the largest 
shifting length h where they split, i.e. where sh^^{s,h) yf sh'^^{s,h) (P3). 

Definition 51 (Construction order) 

Let (Ji,(72 be mr-eonstructions and Sa-^,Scr 2 be the eorresponding sets of shiftable 
propositions. Then ai <jix iff 

• 5cri C Sa 2 , or 

• 5crj = 5 ct2 and for s = Min{s' G [0, oo] | 3h sh^^ (s', h) yf (s'^ h)}, 

• SMro-i(s) C sur^^is), or 

• sur„^{s) = sur^.^{s), s yf oo and for 

h = Max{h' I s/i+ (s, /i') yf s/i+ (s, /i')}, s/i+ (s, /i) C s/i+ (s, /i). 

It is clear that -<jix is a partial order. To see how it works, we may illustrate its 
specification with some examples (PI: 1, P2: 2, 3, P3: 4). 

1. 3(zl V B) ~<jix lAl -I- li? -I- 1 (a1 V B) - because {a 1 V 5} C {Al, B,A\/ B}. 

2. lA + SB -<jix 2A + 2B - because surix^(2) = {B} C {A, B} = sur„^(2). 

3. lA + 2B, 2A + IB are incomparable - because {^} ^ {B}, {B} (f A. 

4. 2A + 2B + 2{A V B) <jix SA + SB + 1{A V B) - for 2 < 3, 0 C {a1, B}. 

What makes the K7r-construction-ordering -<jix particularly attractive is the ex- 
istence of a canonical minimal ktt-I— model for each consistent finite default base. 

Theorem 52 (Canonicity) 

Let A C £(=A>) be finite and consistent. Then there is a single ^jix -minimum 
a \=K.ir+ A. We set JLX[A] = i?^. 

This gives us the following preferential default entailment relation. 
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Definition 53 (JLX-entailment) 

Let S U {ip} C £, Lps = A27, and A C £(=d>) he finite and eonsistent. 

• SUA V’ iff JLX[A]{^'tp\ips) > 0. 

Obviously, is a rational inference relation in the sense of Lehmann. Its main 
strengths are the existence of a transparent preferential semantics, its adherence 
to the minimal construction philosophy, its combination of the epistemic con- 
struction paradigm with normality maximization, and its verification of relevant 
inference patterns. Let us discuss some examples. As a warm-up, we consider 
the exceptional inheritance pattern. So, let Z\i be a default base telling us that 
birds normally fly and that birds are normally small (big = non-small) . 

• Ai = {B ^ F,B ^ S}. 

We want to determine JLX[Z\i]. Obviously, the set of shiftable propositions is 
iSi = {B A -'F, B A “'5'}. Accordingly, the JLX-construction has the form 

• x{B A -•F) + y{B A ~^S) for some x, y > 0. 

The corresponding ranking constraints are R{B A £) -I- 1 < R{B A ^F) and 
i?(i?AS')-|-l < R(BA-'S). It is clear that for s g] 0, 1], suri(s) = {BA-'F, BA~>S}. 
Obviously, the best possible solution is then to have suri(s) = 0 for all s s]l, oo], 
i.e. shifting minimization becomes obsolete. 

• JLX[Z\i] = i?o-kl(SA^S')-kl(SA-A). 

It follows that {B A -iS”} U Ai F. A more sophisticated example is the big 
birds hammer (BBH), where we add to A\ the default that unusual - non-flying 
or big - birds normally don’t fly. 

• A 2 = {B F, B ^ S, B A {~'F V ~'S) ~'F}. 

What we want to know is whether big birds normally fly and non-flying birds 
are normally small. The intuitive answer seems to be that big birds normally 
don’t fly, whereas we cannot assume that non-flying birds are normally small. 
That is, we would expect from a suitable that 

• {B A “'S'} U A 2 ~>F and {B A ~'F} U A 2 ^ S. 

Interestingly, the traditional proposals either satisfy the big birds hammer (sys- 
tem Z) or the exceptional inheritance requirement (Lehmann’s lexicographic clo- 
sure, Geffner’s conditional entailment), but not both. Fortunately, our minimal 
construction strategy is more successful. So, let us compute JLX[Z\ 2 ](F|i? A-'S') 
and JLX[Z\ 2 ](-'S'|i?A-'F’). The set of shiftable propositions is S 2 = {B A-'F, B A 
-•S, B A -•S A F}. Thus, the JLX-construction takes the form 

• x{B A ~'F) + y{B A ~'S) -\- z{B A —'S A F). 
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The ranking constraints resulting from A 2 include those from Ai together with 
R{B A -iF) + 1 < R{B A A F). For s s]0, 1], we have sur 2 {s) = {B A ~<F, B A 
-•S, B A -'S A F}. It is easy to see that we realize the best surprise minimization 
if sur 2 {s) = {B A ~<S A F} for s G]1,2] and sur 2 {s) = 0 for s g]2,oo]. This 
requires R{B A -iF) = R{B A -^S) = 1 and R{B A -■S' A F) = 2, which entails 
X = l,y = 0, z = 2 and 

• JLX[Zl2] = Fo + 1(FA-F) +2(FA-S'AF). 

That is, JLX[Z\2](F|F A -■S') = 1 and JLX[Z\2](-'5'|F A -iF) = 0, which meets 
our expectations. To see shifting minimization in action, we may consider the 
following example, which will also shed light on justifiable constructibility. 

• As = {T ^^AaB,T A,T B}. 

A 3 induces the constraints {R{~'A\/ -^B) > 2, F(-'A) > 1, R{^B) > 1}. The best 
possible cumulative surprise scenario is given by surs^s) = {~'A V -•B, -^A, ~'B} 
for s G]0, 2] and sur 3 {s) = 0 for s g]2, 00]. In this context, the unique K7r+-model 
minimizing the shifting lengths is 1{-'A V -•B) + l-^A + l-'B. That is 

• JLX[Zl3] = Fo + l(-^V-F) + l^A+l^F. 

For instance, {“'^} U A 3 B. Like system JZ, JLX-ent ailment also validates 
basic inferential invariance features [Weydert 98]. 

Theorem 54 (Inferential invariance) 

JLX verifies representation independence and minimum irrelevance. 



6 JLX and JZ 

Our fourth minimal construction principle is justifiable constructibility, which 
was first discussed, in a slightly different formal context, in [Weydert 96]. This 
is an important notion, which is also backed by the information-minimization 
philosophy [Weydert 98]. 

Definition 61 (Justifiable constructibility) 

a is a justifiably constructible KTv+-model w.r.t. A = {Ai A'i \ i < n} iff 

• cr = Bi<^jiafiAi A ~'Afj [=,.^7^-1- 2\, 

• 0 < implies Ra{Ai A A'fi -I- = RffAi A ^A'fi). 

J ModnTT+{A) is the set of justifiably constructible models w.r.t. A . 

The JZ-construction procedure sanctions the following result. 

Theorem 62 (Existence) 

ModiiTr{A) 0 implies J ModK,-K+{A) yf 0. 
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Justifiable constructibility and ^jj^j-minimization represent different faces of the 
minimal construction philosophy. To clarify their relationship, consider our ear- 
lier example A 3 = {T A A B,T A, T => B}. It is not difficult to see that 
R{-'A\/-'B) > 2 has to be satisfied as an equality constraint, whereas R{~'A) > 1 
and R{-'B) > 1 are necessarily oversatisfied, i.e. they do not sanction shiftings 
of -<A and ->B. Consequently, 2{-iA V -•B) + O-'A -I- O-'B is the only justifiably 
constructible ktt-I— model of A 3 , which therefore corresponds to the JZ-model. 
It clearly differs from the JLX-model l(-'ZlV -•B) + 1~'A+ l-'B. So we see that 
^jix-minimality does not guarantee justifiable constructibility. 

In fact, this example illustrates another interesting peculiarity of justifiable 
constructibility. Consider Z \4 = {T ^ A A B,T A,T B}. Although A 4 
has exactly the same ktt -I— models as A3, we have 

• J Modi^TT+{A 4 ) = {0(“'A V ~'B) 2~’A + 2-’B} ^ 

J ModKTT+{A 3 ) = {2(-'A V ~'B) + O—'A + O-’B}. 

That is, we cannot always determine JModf^T^+{A) from Modf^T^ 3 .{A), we may 
have to exploit A itself. Our construction semantics is not fine-grained enough 
to fully grasp justifiable constructibility. It follows that system JZ cannot be 
defined by minimizing a global preference order over K7r-constructions, which 
could only exploit Modn-K+iA). Although system JZ and JLX-entailment share 
standard structural and inferential features, we see that there are also substan- 
tial differences. So, we may distinguish at least two major minimal construction 
traditions in default reasoning. One possibility to bridge the gap between these 
approaches would be to combine them by restricting ^jia,-minimization to jus- 
tifiably constructible ktt -I— models. Surprisingly, this doesn’t bring us anything 
new. 

Theorem 63 (Equivalence JZ - JJLX) 

Let A C £(=4>) be a finite consistent default base. Then the JZ- construction is 
the unique -<jix-minimal justifiably constructible KTT-\--model of A. 

This result gives us a transparent - although not ktt -I— invariant - (partial) pref- 
erential semantics for system JZ. The main conclusion of this paper is that there 
are - at least but presumably not more than - two major implementations of 
the minimal construction philosophy, JLX-entailment and system JZ. Thus, sys- 
tem JZ is slightly less hegemonial than we have thought. Both represent natural 
powerful rational default entailment notions exhibiting nice features. Further 
research will show which one offers the most reasonable default conclusions. 
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Abstract. The suppression of Modus Ponens hy the introduction of a second 
conditional is introduced as a result relevant hoth to psychologists and to AI re- 
searchers interested in default reasoning. Some psychological considerations on 
the explanation of this effect, together with (a) their tentative formalisation 
within the framework of default logic, and (h) recent experimental results from 
the present authors, lead to the conclusion that our understanding of ordinary 
human default reasoning would benefit from considering the existence of a spe- 
cific class of conditional statements, with the pragmatic status of “precondition- 
als”. 



1 Introduction 

Whereas default rules and the handling of their exceptions have long been central 
issues for Artificial Intelligence researchers interested in formalizing human reason- 
ing, psychologists have only recently embraced the task of investigating human de- 
fault reasoning: Psychologists used to consider that human reasoners may treat a 
conditional either as a material implication or as a biconditional, but neglected the 
default, exception-flawed nature of most everyday conditionals (see [1] for a review). 
Things have changed, however, and during the past 10 years many studies have ad- 
dressed human reasoning with exception-flawed conditionals. Yet, one paradoxical 
aspect of this situation is that (a) many notions that have flourished in the psychology 
of default reasoning straightforwardly stem (perhaps unsurprisingly) from one of the 
best established exception-handling formalism, Reiter’s default logic [2], whereas (b) 
most of these notions were introduced in order to explain a rather intriguing result that 
default logic would have difficulty in accounting for, the so-called “suppression of 
Modus Ponens”. 

In the first section hereafter, this result is introduced first in its original form, 
which is more closely related to non-monotonic reasoning, then in later forms which 
highlight its relevance to uncertain reasoning. It is argued that a satisfying formalisa- 
tion of human reasoning should be able to reflect such a robust and general inferential 
behaviour. 

Next, some psychological accounts of the suppression of Modus Ponens are briefly 
(and partially) summarised: It is argued that a formalisation of these accounts in the 
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framework of default logic (as the formalism from which these accounts have the 
most acquaintance) clearly demonstrate that the effect cannot be accounted for with- 
out appealing to some pragmatic considerations on the way people interpret condi- 
tional assertions. Then, building on recent results by the present authors, the case is 
made for the existence of a very specific class of conditional assertions (dubbed pre- 
conditionals) which function is to unconditionally suggest that the justification (in 
Reiter’s sense) of a default is not met, which in turn inhibits the derivation of a Modus 
Ponens inference. 



2 The Suppression of Modus Ponens 

Modus Ponens is certainly one of the most basic, automatic inferences the human 
mind can draw (on this subject, see [3]). Generations of participants in reasoning 
experiments have been proposed premises like “if the ignition key is turned, then the 
car will start ; the ignition key is turned” - and these participants have always almost 
unanimously declared that what followed was “the car will start”. 

However, Byrne [4] first discovered that this very basic inference could be “sup- 
pressed” (blocked, inhibited...) by the introduction of an additional, seemingly in- 
nocuous premise. Consider the following set of premises: 

“If the ignition key is turned, then the car will start; 

If there is gas in the tank, then the car will start; 

The ignition key is turned.” 

When presented with this set of premises, less than 40% of reasoners would derive the 
conclusion that the car will start: Without any sound logical reason, but in an intui- 
tively appealing way, people refrain from applying Modus Ponens to the first and 
third premises of the set when, as we shall see in the next section, the antecedent of 
the second conditional is a justification of the first (default) conditional. 

This result may seem more closely related to nonmonotonic reasoning than to un- 
certain reasoning per se, but it has been quickly reframed in the field of uncertain 
reasoning: Various authors (e.g., [5], [6], [7], [8]) demonstrated that while reasoners 
rated the conclusion of the standard Modus Ponens argument as highly certain, they 
rated this same conclusion significantly less certain when presented with the 3- 
premise set. 

The suppression of Modus Ponens, either in its nonmonotonic framing or in its un- 
certainty framing, has proven impressively robust trough a large number of replica- 
tions using different thematic contents. The suppression of Modus Ponens by the 
introduction of an additional conditional is undoubtedly robust, general, and routinely 
observed. Moreover, people that refrain from applying Modus Ponens have strong 
confidence in the fact that they are not committing any reasoning mistake. Indeed, 
there is strong intuitive appeal in refraining from applying Modus Ponens to the 3- 
premise set above. Anyway, whether this inferential behaviour is desirable or not, one 
would expect a satisfying formalisation of human reasoning to be able to account for 
such a general and robust psychological observation. 

In the following, the symbols a, (3, and y will be used to designate the predicates 
involved in the premise set that leads to the suppression of Modus Ponens. Thus, the 
complete premise set will be noted: 
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If a (x) then y (x) , 

If p (x) then Y (x) , 
a (x) . 

The car-starting example above will be noted: 
If Turn key (car) then Start (car) , 

If Gas (car) then Start (car) , 

Turnkey (car) . 



3 Psychological Accounts and Their Possible Formalisation 

Whatever their subsequent theoretical options, most reasoning researchers interested 
in the suppression of Modus Ponens could be said to agree on at least one point, 
namely the peculiar nature of the antecedent of the second conditional. In brief, P is 
such that were p (x) to be false, y (x) could not be true whatever the truth-value of a 
(x), although the occurrence of a (x) is usually considered sufficient to lead to the 
occurrence of y(x). 

In order to illustrate this agreement, it is briefly outlined below how this point is 
made in two of the most recent contending accounts of the suppression of Modus 
Ponens. Then, a straightforward formalisation of this claim in the framework of de- 
fault logic is proposed. 

3.1 The Nature of the Second Antecedent 

[5] and [9] have in common a pragmatic approach of the suppression of Modus Po- 
nens, some details of which are presented in Section 4. Suffice it to say for now that 
they consider p (x) as a requirement for y(x) to be possible, to the difference of a (x): 
In other words, p (x) is a necessary condition of y (x), whereas a (x) is one of the 
potential causes of y (x). (No car will start without gas in its tank, and fortunately cars 
do not start just because there is some gas in their tank, but when there is gas in the 
tank, one may start the car either by turning the ignition key or, e.g., by hotwiring.) 
The occurrence of a (x) is typically said to lead to the occurrence of y (x) because the 
necessary condition p (x) is usually part of the background assumptions that hold 
when it is asserted that “if a (x) then y (x)”. (When one asserts that “if you turn the 
key, then the car will start”, everybody would usually assume that there is gas in the 
tank.) In [9] P (x) is called a “Complementary Necessary Condition” of y (x). Much to 
the same effect, the term “precondition” is used in [5] to designate p (x). 

Although the view that is adopted in [10] to the suppression of Modus Ponens is 
very different to the one advocated independently in [5] and [9], the key suggestion in 
[10] seems to imply the same assumption about the peculiar nature of P (x). It is sug- 
gested in [10] that Modus Ponens is blocked because the second conditional make 
readily available to reasoners a counterexample situation to the first conditional, 
namely the situation where a (x) is true but y (x) is false because P (x) is false. 
Clearly, this only makes sense if reasoners consider that the falsity of P (x) leads to 
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the falsity of y (x) whatever the truth- value of a (x). One has reasons to consider that 
the situation where a car has no gas in its tank provides a counterexample to the rule 
“if the ignition key is turned, then the car will start” only if one believes that the lack 
of gas makes it impossible to start a car whatever actions are taken regarding turning 
the key. 



3.2 Default Logic and the Suppression of Modus Ponens 

Keeping in mind the nature of the second antecedent as considered in Section 3.1., 
what would be a satisfying formal transcription of the 3-premise set? Like most eve- 
ryday conditionals, the rule “if the ignition key is turned, then the car will start” is 
flawed with exceptions. Without the need to specify what these exceptions could be, 
we can take into account their possible occurrence by expressing the first premise as a 
normal default: 



Turnkey (car) : Start (car) 



Start (car) 



( 1 ) 



Such a transcription would be inappropriate for the second premise, since the sec- 
ond conditional does not express a default. Yet, one striking feature of the psycho- 
logical suggestions in Section 3.1. is that the characteristics assigned to (3 (x) precisely 
make it the justification of a default that would have a (x) as its prerequisite and y (x) 
as its consequent. Hence, the most straightforward way to translate these suggestions 
might be to consider that the second premise, rather than expressing a new default, is 
introducing variation of the first, a general default of the form: 



Turnkey (car) : Gas (car) 



Start (car) 



( 2 ) 



The 3-premise set: 

If a (x) then y (x) , 

If (3 (x) then y (x) , 
a (x) , 

would then turn into the three formulas: 



a(x) : y(x) 



y(x) 



( 3 ) 



a(x): 13(x) 



y(x) 



( 4 ) 



a (x) . 



( 5 ) 
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Is the suppression of Modus Ponens accounted for by such a formal transcription? 
It is not, for y (x) is still a conclusion that follows from the three formulas above. The 
second conditional has been taken as specifying a counterexample situation to the 
first, i.e., as replacing the “normal” justification y (x) in the first default by the newly- 
specified justification (3 (x). However, the two defaults make no other restriction on 
the derivation of y (x) from a (x) than to ensure that nothing in the knowledge base is 
inconsistent with either y (x) or (3 (x). Were there any reason to consider that (3 (x) is 
untrue, the conclusion y (x) would be blocked. Since there is no information repre- 
sented here about (3 (x), (3 (x) has to be considered true, hence the derivation of y (x). 

The derivation of y (x) would only be blocked if some information was available 
that would hint at the non-satisfaction of the justification (3 (x). As a consequence, if 
the suppression of Modus Ponens is to be explained in the general framework of de- 
fault logic, what is needed is some insight on how some information regarding the 
falsity of (3 (x) could be conveyed by the 3 premise-set. Are there any experimental 
results that would support the view that some information regarding the falsity of (3 
(x) is indeed embedded in the 3-premise set? This issue is dealt with in the next sec- 
tion. 



4 Recent Empirical Results : Preconditional Statements 

Whereas common world knowledge makes it obvious that the presence of gas in the 
tank is a necessary (and not a sufficient) condition for the car to start, the assertion “if 
there is gas in the tank, then the car will start” seems to give the presence of gas a 
different status: one of sufficiency, one of causality. This discordance between the 
ordinary role of gas (as a background necessary condition for the car to start) and the 
specific role the conditional syntax seems to grant it (as a sufficient factor for the car 
to start, a factor that would explain the starting of the car) is important because em- 
pirical research has demonstrated that people usually avoid explaining events by ap- 
pealing to their necessary conditions : Necessary conditions (e.g., presence of gas) are 
ordinarily considered infelicitous explanations of an event (e.g., a car starting). (See 
e.g., [11] and [12].) 

Yet there is one particular situation where necessary conditions are considered 
relevant explanations of an event: Situations where the necessary condition is not 
readily available, cannot be presupposed, or is not easily satisfied (as demonstrated in 
[13] and [14]). For example, saying that “Mr X. ate because there was food available” 
is more felicitous than to say “Mr X. ate because he was hungry” in the situation 
where Mr X. is a refugee who had been starving for three weeks due to lack of food. 
As a consequence, it is conceivable that asserting “if there is gas in the tank then the 
car will start” (i.e., explaining the starting of the car by the satisfaction of a back- 
ground requirement) would only make sense if the presence of gas was not readily 
available, or could not be presupposed. 

From these considerations together with some elements of conversational prag- 
matics, two experimental predictions are derived in [5]. First, that most people, if the 
3-premise set was presented to them as a conversation, would recognise the intention 
of the second conditional: that is, conveying doubts on the satisfaction of the require- 
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ment (3 (x). Second, that those people that did recognise the intention would manifest 
low confidence in the conclusion y (x), whereas those people that did not recognise 
the intention would not decrease their certainty in y (x) as compared to the certainty 
they would grant it from a standard Modus Ponens argument. 

It is reported in [5] that 60 students were presented with different Modus Ponens 
premises (either in standard form or within 3-premise set). The students had to (a) rate 
on a 7-point scale (from “no chance to be true” to “certainly true”) the confidence 
they had in the occurrence of y (x), and (b) to say if, in their opinion, the locutor as- 
serting the second conditional wished to convey the idea that chances were for (3 (x) 
not to be true. 

When asked if the second conditional of a 3-premise set was intended to convey 
the idea that chances were for [3 (x) not to be true, almost 80% of participants an- 
swered affirmatively. More importantly, participants that recognised this intention 
granted the conclusion y (x) a mean certainty of 3.62 (on a 7-point scale), whereas 
participants that did not recognize the intention granted this conclusion a certainty of 
5.46. As a standard to compare those ratings with, the mean certainty granted to con- 
clusions of standard Modus Ponens arguments was 5.57. 

People that did not see any specific information regarding the truth-value of (3 (x) 
in the second conditional applied Modus Ponens as they would have done without the 
second conditional. But most people take the second conditional not only as specify- 
ing a justification (3 (x) to the default rule “if a (x) then y (x)”, but also to convey 
some information hinting at the non-satisfaction of the justification (3 (x). Thus, this 
majority of reasoners seems to interpret the 3-premise set the following way: 



a(x) : y(x) 


(3) 


y(x) 




a(x): 13(x) 


(4) 


y(x) 




(3 (x) . 


(6) 


a (x) . 


(5) 



From this set of formulas, the conclusion y (x) is no longer derivable. Thus, the 
suppression of Modus Ponens could be explained, and this explanation formalised, if 
it was agreed on the existence of a specific class of conditional assertions that will be 
called here “preconditional statements”. 

Preconditionals are statement of conditional syntax like “if (3 (x) then y (x)”, where 
(3 (x) is known from implicit knowledge to be a requirement for y (x) to happen. De- 
spite their conditional syntax, preconditionals do not make any conditional claim: 
They unconditionally suggest that their antecedent [3 (x) is untrue. Indeed, precondi- 
tionals do not serve the same function as regular conditionals, nor do they obey the 
same rules: They form a class of assertions that is pragmatically distinct from the 
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class of regular conditionals. To consider their existence could be the decisive step 
into solving the logical riddle of the suppression of Modus Ponens by lay reasoners, 
and a promising step into formalising ordinary human default reasoning. 
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Abstract. This paper contains, first, a brief formal exploration of the 
relationships between information (statistically defined), statistical hypothesis 
testing, the channel capacity of a communication system, and uncertainty. 
Thereafter several applications of these ideas in experimental psychology are 
examined. The applications are grouped under “Mathematical theories that are 
not matched to the psychological task”, “The human observer treated as a 
physical system”, and “Bayes’ theorem”. 



1 Introduction 

The notion of information has entered experimental psychology through two, quite 
distinct, points of entry. The first was a paper hy Miller and Frick [20] that introduced 
Shannon’s [25] communication theory to a psychological audience. Garner [2, p. 8 et 
seq.] has charted the explosive impact that those ideas had within psychology. The 
second was through signal detection [29]. Without using the label ‘information’, 
these authors transposed the “Theory of signal detectability” [24] into sensory 
discrimination, and the “Theory of signal detectability” stands in a direct line of 
intellectual descent from the Neyman-Pearson Lemma [21; see 15]. 

Notwithstanding that within psychology these two traditions have evolved in 
complete independence from each other, their intellectual foundations are closely 
related. The first task of this account is to bring out that interrelationship as simply as 
possible. I then examine a number of applications within experimental psychology, 
some successful, others misconceived, with a view to some general conclusions, how 
the idea of information might profitably be exploited and what mistakes need to be 
guarded against. 



2 Information and Uncertainty 

Suppose I do an experiment and record a matrix of data X. Because this particular 
configuration of data will play a pivotal role in what follows, I cite, as an example in 
Table 1, one set of data from Experiment 4 by Braida & Durlach [1]. In this 
experiment 1 kHz tones of various intensities were presented for 0.5 s, one at a time, 
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and the subject asked to identify each one in turn. The matrix in Table 1 shows the 
number of times each stimulus value was presented and the given identification made. 

Table 1. Absolute identification of IkHz tones with 2 dB spacing of stimulus values from 
Braida & Durlach [1, Expt. 4, Subj. 7] 



Stimuli (dB) 








Responses (dB) 








68 


70 


72 


74 


76 


78 


80 


82 


84 


86 


68 


120 


37 


8 


1 


0 


0 


0 


0 


0 


0 




33 


74 


42 


15 


0 


0 


0 


0 


0 


0 


72 


8 


47 


76 


33 


10 


2 


0 


0 


0 


0 


74 


0 


8 


38 


73 


48 


10 


1 


0 


0 


0 




0 


1 


9 


43 


108 


45 


9 


0 


0 


0 


78 


0 


0 


1 


9 


61 


77 


36 


3 


1 


0 




0 


0 


0 


0 


7 


48 


58 


29 


1 


0 


82 


0 


0 


0 


0 


0 


5 


38 


74 


38 


1 


84 


0 


0 


0 


0 


0 


1 


6 


25 


115 


29 




0 


0 


0 


0 


0 


0 


0 


3 


32 


123 



Suppose 1 have a particular hypothesis about my experiment. Call that hypothesis 
H„. I cannot tell whether H„ fits my data absolutely, but I can ask whether it fits better 
than some other hypothesis Hj. The Neyman-Pearson Lemma [21] tells us that the 
optimum statistic for distinguishing from any other state of nature (Hj) is the 
likelihood ratio X = P(AIHj)/T’(XIHJ, ultimately on the principle of choosing that 
hypothesis which is the more likely in the light of the data. 

If my experiment is not sufficiently decisive, 1 can repeat it to obtain two 
independent data matrices, X^ and X^. Then 

X = P(X,&X,\¥l,)/P(X,&X,\U,) (1) 

= [P(XJH,)/P(A,IH„)] [P(AJH,)/P(AJH„)], 

because independent probabilities multiply. Taking logarithms in Eq. 1, 

lnX = ln[P(A,IH,)/P(A,IH„)] + ln[P(A,IHj)/P(A,IH„)], 

and the expression splits into two independent parts, one for each replication of the 
experiment. Accordingly, it is convenient to define 

\nX = ln[P(AIH,)/P(AIH„)l (2) 

to be the information in the data matrix X in favour of hypothesis Hj and against 
[9, p. 5]. Note the involvement of two hypotheses. Information is information about 
something. Data is absolute, but information is relative to the two hypotheses to be 
distinguished. 



2.1 Testing Statistical Hypotheses 

Suppose hypothesis is a special case of (some otherwise free parameters are set 
to zero or equal to each other). Then In X = ln[P(AIHj)/P(AIHJ] is the optimum 
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statistic for testing H„ against all the possible states of nature encompassed by Hj. 
The statistic 2 In X is distributed asymptotically as yi [35]. Most parametric statistical 
tests (the analysis of variance, for example) fall out of this formulation [9], simply by 
inserting appropriate hypotheses H„ and Hj in Eq. 2. The best-known exception is 
Pearson’s X . 

The statistical tests in use are those for which it is feasible to calculate the 
distribution of the statistic when is true. As an example, suppose that asserts 
that the row and column classifications in Table 1 are independent; (this is manifestly 
false, but this is the H,, for which the distribution of the likelihood-ratio is readily 
calculable). Let p.. stand for the probability of some particular combination of 
stimulus (i) and response (/); let p, be the marginal probability of stimulus i, and p.- 
the marginal probability of response j. Then, the hypothesis of independence is 

H„: Pij = Pr P-p 

while the alternative hypothesis (Hj) allows the probabilities of individual cells (p.) to 
assume any set of values that sum to unity. The probability ratio attaching to a trial in 
which stimulus i is presented and response j occurs is (p^j IPi-P-) and the average 
information, averaged over all combinations of stimulus and response, is 

hPMPiJ^PrP-)- 

In practice, the unknown probabilities (p-j, and p.^) are estimated from the data and 
the resultant statistic gives us Wilkes’ [34] likelihood-ratio test of independence in 
two-way contingency tables. 



2.2 Channel Capacity in a Communication System 

Suppose now that my experiment consists of sending messages through a 
communication system. As in Table 1, I record the number of times message 
(stimulus) i is sent and received as message (response) j. Given the resultant matrix 
of data (X), I ask: Is this communication channel working (with some degree of 
reliability, Hj) or is the line open-circuit (H^)? The appropriate statistical test is 
Wilkes’ [34] likelihood-ratio test of independence. If the transmission takes T s, then 
pln[pJp..p.^]/T is an estimate of the information transmitted per second. 
Depending on my choice of message ensemble (that is, of my experimental design), 
the information transmitted might take various values. But there is an upper limit, 
achieved when the experiment is optimally matched to the statistical characteristics of 
the channel. This upper limit is known as the channel capacity. Shannon [25] 
showed that, given an arbitrary message source, a system of encoding could always be 
found that afforded transmission at a rate arbitrarily close to the limiting capacity of 
the channel, but that this capacity limit could never be exceeded. 



2.3 Uncertainty 

Suppose I send a single message, selected with probability {p .. } from a set of possible 
messages. This is received as message j. The mean information transmitted with an 
arbitrary selection of the message is given by Eq. 3, and that expression has a 
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maximum value when the message received (j) identifies the message sent (i) 
uniquely. This happens when = p.-, and the maximum value is then 

^ .p)np.. (4) 

This expression is the uncertainty of the choice of message. But if I know that 
message j was received, the posterior probabilities attaching to the different inputs, 
calculated from Bayes’ theorem, are pjp.-, and the uncertainty (now residual 
uncertainty) is reduced to ^ , (pjp ) ln(p,//?.j). The residual uncertainty averaged 
over the different messages received is 

But 

P.MpJPrP-^ = ^ ,■ P,ln p, + ^i, Pij inipjp.j)-, (6) 

so the information transmitted is equal to the difference between the initial (stimulus) 
uncertainty (Eq. 4) and the residual uncertainty (Eq. 5) given the message received. 

One might, for this reason, be tempted to suppose that uncertainty is fundamental 
and information derivative. But suppose the input message is a normally distributed 
voltage (zero mean, variance o^) and the output similar, with correlation p The 
information transmitted is then -*/ 2 ln(l-p^) [9, p. 8], irrespective of the value of but 
the input uncertainty is -*/2ln(27U ^)-l, which depends on the choice of o^ When 
information is calculated as a difference of uncertainties (as in Eq. 6), the scale factor 
(o^) drops out of the reckoning. Uncertainty therefore stands in relation to 
information as velocity potential stands in relation to velocity or voltage to current 
flow in an electrical circuit. Only differences in potential or voltage are significant. 



3 Mathematical Theory Not Matched to the Psychological Task 

It is not sufficient merely for an equation to agree with an observed result; the 
assumptions from which that equation is derived must also match the details of the 
psychological experiment. 



3.1 Hick’s Law 

There are n equally probable stimuli (pea bulbs) arranged in a somewhat irregular 
circle. The subject responds as quickly as possible with a corresponding response. 
Hick [5] fit his own data and some historic data from Merkel [19] to the equation 

Mean R.T. = a ln(«+l) (7) 

in which the possibility of “no signal” was treated as an («+l)th alternative. The 
quality of the fit is shown in Figure 1, where the abscissa is scaled according to 
ln(«+l). Now put p.. equal to !/(«+!) in formula 4. The stimulus uncertainty is 
ln(«+l), and Eq. 7 is equivalent to 

Mean R.T. = a(Stimulus uncertainty) 
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No. of stimuli 



Fig. 1. Mean choice reaction times from [5] and Merkel [19] plotted against ln(n+l). 

The idea here is that mean reaction time is equal to the time taken to pass a 
message through an ideal communication system to specify the response that needs to 
be made. This result, above all others, was influential in encouraging the idea that the 
human operator was analogous to a communication channel operating at maximum 
capacity. It has given us such terms as “(channel) capacity” and “encoding”. But this 
idea will not wash. A choice-reaction experiment involves the transmission of single 
stimuli, one at a time, a condition that affords no opportunity for the sophisticated 
coding on which Shannon’s theorem depends. Shannon’s theory applies only to the 
limiting rate at which messages from a continuous source can be passed through a 
channel, and those messages might suffer an arbitrary delay before transmission to 
allow for encoding. The logarithmic formula relates only to the length of encoded 
signal required to carry the message, not to the delay it might suffer in transmission. 

This mismatch between the assumptions of the mathematical theory and the 
circumstances of the experimental task has this consequence. The task becomes much 
more difficult if the subject is instructed to respond to the signal one, or two, or three 
places back in the series, notwithstanding that the task then approximates more 
closely the condition under which a communication channel operates efficiently; and 
performance collapses altogether when the response has to be produced four stimuli 
in arrear [7]. 



3.2 Wason’s Selection Task 

There are four cards, each of which has a letter on one side and a number on the other. 
Given the cards with ‘A’, ‘K’, ‘2’, and ‘7’ uppermost, which of them need to be 
turned over to discover whether it is true that “If a card has a vowel on one side, it has 
an even number on the other”? Only about four per cent of subjects select ‘A’ and ‘7’ 
[6]. 
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Oaksford and Chater [22] proposed that subjects consider these two hypotheses 
with respect to an imagined population of cards of which the four are but a sample: 

H„: P( vowel & odd number) = 0 

Hj! Number (even or odd) independent of letter 

(vowel or consonant). 

But statistical methodology requires that H„ (the rule to be tested) be compared to all 
alternative states of nature, that is to 

Hj: P( vowel & odd number) 0, 

not just to the special case Hj. Oaksford and Chater have surreptitiously assumed that 
all possibilities alternative to H„ and Hj are seen by the subjects to have probability 
zero, and that assumption is slipped in without even an attempt at justification. The 
comparison between H„ and this particular H, does not, in fact, generate a model of 
Wason’s selection task. It does, however, generate the so-called ‘ravens paradox’ 
[18, 23]. 

Oaksford and Chater [22] next proposed that subjects select amongst the four cards 
according to the expected yield of information measured according to 

^.P,MP./PrP-)’ 

where i indexes the hypothesis (H„ or Hj) and j the choice of card. They believed that 
such a sampling strategy would be optimal, but Klauer [8] has shown otherwise. 
Formula 8 is the Shannon measure of information transmitted conditional on selecting 
Card j. Now, the underside of Card j may well tell the subject that the rule does not 
hold, but that is not what formula 8 measures. It compares, instead, the hypotheses 

H|': Underside of card independent of whether H„ 

or H, holds, and 

Hj': Underside of card related to the distinction 

between H„ and Hj. 

That is, it measures the extent to which the underside of Card j is relevant to the 
discrimination between H„ and Hj. While this might appear a plausible basis for 
choice, even more relevant would be the expected yield of information in favour of 
and against Hj (or, more correctly, Hj). 

However, if the information is correctly evaluated with respect to H„ and Hj 
(P(vowel & odd number) = 0 and P( vowel & odd number) 0), it delivers the 
conventional logical prescription, ‘A’and ‘7’ [16], This is just what one should 
expect from a valid mathematical theory. 



4 The Human Observer as a Physical System 

Kullback [9, p. 22] proved a fundamental theorem that says, in words, “There can be 
no gain of information by statistical processing of data.” This theorem has a profound 
application. 




Statistical Information, Uncertainty, and Bayes’ Theorem 



641 



4.1 Signal Detection 

Think of the human observer as a purely physical system and the stimulus as a datum. 
Sensory analysis of that input equates to “statistical processing of data” and human 
performance is limited by the information implicit in the stimulus. The signal- 
detection operating characteristic provides a direct estimate of the distribution of the 
information transmitted [see 12, pp. 98-103]; in fact, the logarithm of the gradient of 
the operating characteristic is the information random variable in favour of ‘signal’ 
and against ‘noise alone’. This provides a basis for comparing the information 
implicit in the observer’s responses with the information supplied by the stimulus. 

Signal detection theory has been revolutionary in the field of sensory 
discrimination. It distinguishes between the information available to the observer and 
the partitioning of values of that information between the available responses (the 
choice of criteria). Looking solely at information throughput, and disregarding the 
criteria, it can be shown that the information available to the observer is derived from 
a sensory process that is differentially coupled to the physical stimulus, because the 
component of information derived from the stimulus mean is entirely absent from the 
information implicit in the observer’s performance [13, pp. 169-172]. This provides 
an explanation of Weber’s Law and of many other related phenomena [see 13, 14], 



5 Bayes’ Theorem 

Bayes’ theorem specifies how posterior probabilities may be calculated from the 
combination of prior probabilities and a probability ratio calculated from 
experimental data. If the result of the first replication of the experiment in Eq. 1 be 
taken as defining a prior probability ratio, [P(H,)/T’(H„)], for the second replication, 

[P(HJX)/P(H„IX)] = [B(H,)/P(H„)][B(XIH,)/B(XIH„)]. 

On taking logarithms, 

ln[P(HJX)/P(HJ20] = ln[P(H,)/B(H„)] + ln[P(XIH,)/P(XIH„)] (9) 



or 



Posterior information = Prior information -i- Information in data X. 

Probability ratios are exponents of information values and a simple transformation 
relates Eq. 9 to the usual form of Bayes’ theorem. Human performance is manifestly 
influenced in many experiments by the probabilities of the different stimuli (e.g. Fig. 
1). The present question is whether such effects are accurately described by Eq. 9. 



5.1 Two-Choice Reaction Experiments 

There are two alternative signals. One of two responses is to be made “as quickly as 
possible”. One interesting idea is that reaction time is the time taken to collect 
sufficient information to make a response to some prescribed level of accuracy. 
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Imagine the experiment of Eq. 1 to be repeated many times; if i indexes successive 
replications, 

ln[P(H,IE, X)/E(H„IE, X)] = ln[E(Hj)/P(H„)] + E, ln[P(X, IH,)/P(X, IH„)]. (10) 

The sequence of replications {X} continues until the posterior information (on the 
left) reaches some desired bound, which guarantees that Response 1 (Hj), rather than 
Response 0 (HJ, is correct to within some small degree of error. The subject chooses 
a desired level of accuracy and the reaction times follow stochastically from that 
error-criterion. The mathematics required to develop the idea is the sequential 
probability ratio test [33], and the idea itself was first suggested by Stone [27]. In 
effect, Bayes’ theorem is continuously and repeatedly applied to test the validity of 
the accumulated evidence and one can hardly get more Bayesian than that. 

This idea does not work. While my own data [10] might suggest otherwise, there 
are further unpublished results that show it to be hopeless. I explain why it will not 
work. 



5.2 The Choice of Criterion in Signal-Detection Experiments 

If signal detection data conform accurately to the normal, equal variance, model, there 
is a particular location of the criterion, varying with signal probability, which 
minimises the total number of errors [3]. Figure 2 compares two sets of calculations, 
‘Bayes’ theorem’ from [4, p. 90] using data from one subject in the experiment by 
Tanner, Swets, & Green [30] and ‘Probability matching’ based on a suggestion by 
Thomas & Legge [31]. The abscissa coordinate is the criterion value of likelihood 
ratio specified by ‘Bayes’ theorem’ (asterisks) and by ‘Probability matching’ (open 
circles) respectively. The ordinate is the criterion value (the same in both 
calculations) estimated from the data. If the predictions were accurate, then the 
estimated criterion values would be equal to the calculated values. The diagonal 
dashed 45° line tracks those estimates of criterion placement that would match the 
predictions (either set) exactly. It can readily be seen that the actual likelihood ratio 
at the estimated criterion is always conservative, too close to unity, relative to the 
predictions from Bayes’ theorem. But suppose, instead of minimizing the number of 
errors, the subject merely adjusts the frequency of “Yes” responses to match the 
frequency of signals (i.e., ‘probability matching’, [31]). The concordance between 
the open circles and the dashed line shows that idea to work well. But why? 

In this experiment [30] the subject had feedback at the end of each trial and so 
knew immediately when he had made an error. Under such circumstances, subjects 
effect a large shift in criterion following an error, shifting in the direction that reduces 
the risk of a similar error in future (but increasing the risk of an error of the opposite 
kind; [28]). The signal-detection criterion is not a fixed parameter, but evolves from 
trial to trial in a dynamic equilibrium, driven by re-adjustments following each error. 
It drifts towards a value where the absolute numbers of errors of each kind are equal. 
That equality means that the numbers of “Yes” responses lost through a mistaken 
“No” equate to the number gained through a mistaken “Yes” and, overall, the 
frequency of “Yes” matches the frequency of the signals. 
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Fig. 2. Calculations of likelihood ratio at optimal criterion (‘Bayes’ theorem’) from [4, p. 90], 
and of likelihood ratio given probability matching [31] using data from one subject in the 
experiment by Tanner, Swets, & Green [30]. 



Table 2. Numbers of all combinations of signal and response for two subjects in the experiment 
by Tanner, Swets, & Green [30]. 



P(signal) 


“No”l noise 


“Yes”l noise 


“No”l signal 


“Yes”l signal 


Observer 1 


0.10 


521 


19 


37 


23 


0.30 


365 


55 


75 


105 


0.50 


194 


106 


84 


216 


0.70 


90 


90 


72 


348 


0.90 


12 


48 


22 


518 


Observer 2 


0.10 


492 


48 


40 


20 


0.30 


334 


86 


89 


91 


0.50 


180 


120 


87 


213 


0.70 


86 


94 


90 


330 


0.90 


18 


42 


43 


497 



Table 2 sets out the relevant data in more detail, showing the actual numbers of 
errors (Cols 3 & 4), both for the subject in Fig. 2 and for another subject in the same 
experiment [4, p. 95]. The numbers in Cols 3 and 4, are approximately the same, 
even though the corresponding numbers of correct responses (Cols 2 & 5), and 
therefore the absolute probabilities of each kind of error, vary widely. 

In an experiment where many people expected prior probabilities to enter into a 
rational calculation based on Bayes’ theorem, that failed to happen. Instead, 
performance was driven by successive shifts of the criterion, oscillating around a 
dynamic equilibrium where the numbers of errors of each sort were approximately 
equal. To performance in this kind of experiment Bayes’ theorem does not apply. 
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5.3 Absolute Identification 

Analysis of two-choice reaction times [10, Ch. 8; 11] shows that both latencies and 
errors are subject to a similar series of trial-to-trial adjustments. That is, Bayes’ 
theorem does not apply, either, to the involvement of prior probabilities in choice- 
reaction times. But what about the aggregation of information (Eq. 10) during a 
single trial? The sequential probability ratio test is isomorphic to a random walk and 
can also be modelled as a diffusion process in continuous time. Is that idea applicable 
to the human operator? To see why not, I turn to an experiment on absolute 
identification [1, Expt 4]. 




Stimulus spacing (dB) 



Fig. 3. Estimates of the variance of identification judgments in each condition (different 
stimulus spacings) in Experiment 4 by Braida and Durlach [1]. Differently shaped symbols 
show estimates from three different subjects. (Erom “Reconciling Fechner and Stevens?” by D. 
Laming, Behavioral and Brain Sciences, 1991, vol 14, p. 191. Reproduced by permission.) 

The stimuli were ten 1 kHz tones of different amplitudes. The subjects were 
required to identify individual tones in isolation. In different sessions the tones were 
spaced at 0.25, 0.5, 1, 2, 3, 4, 5, 6 dB intervals. (Table 1 sets out the data for one 
session from this experiment). Figure 3 plots estimates of the variability of the 
identifications calculated in this manner. Torgerson’s [32] Law of Categorical 
Judgment with equal variances (Class 1C) was used as model, but with the means set 
equal to the decibel values of the stimuli. The one free parameter was the standard 
deviation, a. Figure 3 plots the values of the variances, a^, for three individual 
subjects, against (the square of) the stimulus spacing. The estimated variances 
increase in proportion to the square of the spacing, with a small intercept, 1.52 dB^ 
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The point here is that, except for the declining influence of the intercept, resolution 
does not improve with wider spacing of the stimulus values. Identification of the 
stimuli is tied to the geometric ladder of stimulus magnitudes and is no better than 
ordinal [17, Ch. 10]. That is, the judgment of one stimulus relative to another is no 
better than <greater, about the same, less> and the aggregation of such crude ordinal 
comparisons cannot support a sequential probability ratio test procedure of the kind I 
envisaged in my study of two-choice reaction times [10]. 



6 Conclusions 

(i) There are some simple interrelationships between the notions of statistical 
information, statistical hypothesis testing, and their applications in psychology, that 
are less well understood than they need to be. Under the influence of Shannon’s 
theory, psychologists are wont to suppose that information is an absolute. Not so! 
Data is absolute, but information is always relative to the two hypotheses between 
which it distinguishes. 

(ii) If the human operator be viewed as a purely physical system, then 
Kullback’ s [9, p. 22] theorem applies unconditionally. Analysis of information flow 
provides a ‘model-independent’ technique for identifying the ‘information-critical’ 
operations involved. In this way information theory provides, as it were, a ‘non- 
parametric’ technique for the investigation of all kinds of systems without the need to 
understand the machinery, to model the brain without modeling the neural responses. 

(iii) But Bayes’ theorem fails to describe the contribution of prior information. In 
signal detection and two-choice reaction-time experiments performance fluctuates 
from trial to trial about a dynamic equilibrium that does not correspond to the optimal 
combination of information from different sources. 
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Abstract. The aim of this paper is to test if conjunctive and disjunctive 
judgments are differently accounted for possibility and probability theories 
depending on whether fl) judgments are made on a verbal or a numerical scale, 
(2) the plausibility of elementary hypotheses is low or high. 72 subjects had to 
rate the extent to which they believe that two characters were individually, in 
conjunction or in disjunction, involved in a police case. Scenarios differed on 
the plausibility of the elementary hypotheses. Results show that the possibilistic 
model tends to fit the subjects’ judgments in the low plausibility case, and the 
probabilistic model in the high plausibility case. Whatever the kind of scale, the 
possibilistic model matches the subjects’ judgments for disjunction, but only 
tends to do it for conjunction with a verbal scale. The probabilistic model fits 
the subjects’ judgments with a numerical scale, but only for disjunction. These 
results exhibit the polymorphism of human judgment under uncertainty. 



1 Introduction 

Uncertainty is a constitutive aspect of human cognition, and to some extent, the 
human cognitive system is specialized in processing uncertainty. This contrasts with 
the old idea in psychology that people permanently try to avoid or to reduce 
uncertainty [6]. However, such a reduction or avoidance is not always possible. In 
that case, in order to make sense of internal or external events or states, and in order 
to make decisions, people must combine uncertain information efficiently. This leads 
to some questions. How does one represent feelings of knowing, beliefs, doubts, and 
expectancies... accurately? Is there a unique set of formal rules sufficient to describe 
uncertainty combinations by the human cognitive system precisely? Numerous 
psychological studies have been devoted to the search for answers to such questions. 
Not surprisingly, most of them have focused on a probabilistic representation of 
uncertainty. For example, Kahneman, Slovic and Tversky [7] have conducted a vast 
research program in order to test human intuitions and performances about 
probabilities. The very important amount of obtained data exhibited a contrasted 
panorama: people succeed in some classes of problems and fail in others (see [5] and 
[7]). A subsequent purpose has been to establish the factors or the conditions under 
which people behave normatively. In particular, studies of judgment under 
uncertainty have exhibited the general result that judgments of probability are 
influenced by contextual features that are unrelated to a problem’s formal structure 
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[12]. Another strong result is that people commit two kinds of fallacies: the 
conjunction and disjunction fallacies. The former is committed when the estimated 
probability of a conjunction exceeds the probability of either constituent [17], the 
latter when the probability of a disjunction is lesser than the probability of at least one 
of its constituents [9]. These two biases are of particular interest because they are 
directly related to crucial rules of uncertainty composition, whatever the considered 
normative framework. Another line of research has consisted in testing human 
judgment under uncertainty given non-classical probabilistic models of uncertainty. 
Several models studied by artificial intelligence present interesting properties from a 
psychological point of view. These models include at least the Bayesian approach, 
probabilistic logics, belief functions, upper and lower probability systems, and 
possibility theory. Except for the Bayesian approach, only few or no psychological 
studies have focused on these models. Possibility theory has been considered by 
Zimmer [20] and Raufaste and Da Silva Neves [11]. 

This paper extends Raufaste and Da Silva Neves’s previous study, and pursues the 
objective to gain insight into the conditions under which human judgment under 
uncertainty follows basic rules of either probability theory or possibility theory. It 
leaves apart other frameworks like belief functions. In order to achieve this objective, 
an experiment has been conducted that tests the convergence of human conjunctive 
and disjunctive judgments with the possibilistic and the probabilistic models, given 
two kinds of factors. The first one depends on the kind of scale designed to measure 
the subjects’ uncertainty about single, conjunctive and disjunctive hypotheses. Two 
scales have been tested: an ordinal one and a numerical one. The second factor is the 
relative plausibility of competing hypotheses. Three conditions were studied. In the 
first one both hypotheses were unlikely, in the second one only one hypothesis was 
unlikely and the other one was very likely, and in the last condition both hypotheses 
were very likely. Moreover, the way each model fits subjects’ judgments was 
compared for conjunctive and disjunctive judgments when judgments produce a high 
level of plausibility on the one hand, and a low level on the other hand. 

Section 2 introduces the probabilistic and possibilistic frameworks, some previous 
empirical results, and our objectives based on these results. Section 3 presents the 
experimental apparatus, the method for the test of our hypotheses and results. Some 
concluding remarks are made in section 4. 



2 Formal Apparatus, Previous Empirical Findings, and Objectives 

According to Zadeh ([19] p. 4) “Contrary to what has become a widely accepted 
assumption -much of the information on which human decisions are based is 
possibilistic rather than probabilistic in nature”. In a previous study, Raufaste and Da 
Silva Neves [11] have shown that human experts might behave in a way that is much 
closer to possibilistic predictions than to probabilistic ones. However, a significant 
correlation between possibilistic measures and probability measures has been found, 
and possibilistic and probabilistic predictions have been differentiated statistically for 
conjunction only, but neither for simple disjunction nor for exclusive disjunction. 
Thus, if it can be concluded that under some conditions “subjective probabilities” 
could be reinterpreted as “subjective possibilities”, these conditions are not clear. This 
section introduces the two frameworks and some empirical results related to their 
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empirical validity with regard to human subjective degrees of confidence. Next, 
critical results are outlined and our experimental questions are formulated. 



2.1 A Brief Recall of Probability Theory and Possibility Theory 



Probability Theory 

A probability measure P is defined on a family of events, each one construed as a set 
of possibilities so that (1) for any event A, P(A) > 0; (2) for an event A’ certain to 
occur, P(A’) = 1; (3) the probability of an event equals the sum of the probabilities of 
its disjoint outcomes (additivity). In addition, consider two independent events A and 
B, according to mathematical probability theory, 

P(Ar^B) = P(A) * P(B) (1) 

where P{ArB) is the probability of occurrence of both A and B and P{A) and P{B) are 
the probabilities of occurrence of A and B considered separately. In case of dependent 
events, 

P(AnB) = P(A) * P(B/A) = P(B) * P(A/B) with P(B/A) = P(AnB)/P(A) (2) 

The probability of occurrence of A or Z? or both (P(AuB)) is 

P(AuB) = P(A) + P{B) - P(ArP) (3) 

Probability theory has traditionally been used to analyze repetitive chance processes, 
but the theory has also been applied to essentially unique events where probability is 
not reducible to the relative frequency of “favourable” outcomes [17]. 

Possibility Theory 

Let X be an ill-known variable, and the set of all the values O) that X can take. 
Zadeh defined a “possibility distribution” [0, 1] which expresses the level 

of plausibility of (O, that is the degree to which it is possible that the actual value of an 
ill-known variable X is (O. The interval [0, 1] is taken as a set of ordinal values, not 
necessarily numeric. Now, if A is an event (i.e., a subset of Q, e.g. a particular 
diagnostic hypothesis), the “possibility measure” that A is correct is 
/X^A)=sup^^jt,,(ty). Possibility measures satisfy the property of max-decomposability 
for disjunction, 

nAvB)=Max{IJA), IXB)) (4) 

The dual measures of possibility measures are certainty measures defined in such a 
way that 

A(A)=1-7j[^) (5) 



Certainty measures satisfy Min-decomposability for conjunction 
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N(AaB) = Min (N(A), N(Bj) (6) 

Moreover, according to [4] under the dependence hypothesis, 

IXAAB)=Min(HB/A), IJA)) with, when 1JA)>0 (7) 

riBIA)=l if 1 J^AaB)= If A); = IJAaB) otherwise. 



2.2 Empirical Results 

Generally, psychologists have cast a substantial doubt on the generality of the 
hypothesis that the subjects’ probability estimates are related in a way described by 
the laws of mathematical probability theory. However, it has been found that while 
there was not a perfect conformity between the subjects’ judgments and the proper 
combinations of the probabilities for elementary events, rules from probability theory 
yielded better descriptions than some alternate improper rules did [1]. More recently, 
[12] have found differences in probabilistic reasoning as a function of whether 
problems were presented in a frequentist or case-specific form. These different forms 
influence the likelihood of subjects committing the conjunction and disjunction 
fallacies. Furthermore, it has been found that verbal probabilities do not simply reflect 
the objective level of uncertainty, but are also determined by how this degree of 
uncertainty is brought about [14]. In everyday life, uncertainties are most commonly 
expressed through verbal phrases, like “possibly”, “perhaps”... although some 
numerical estimates, usually given as percentages also have become a part of lay 
vocabulary of probability and risk. Attempts to quantify verbal expressions of 
uncertainty have demonstrated that different terms typically refer to different levels of 
probability [8] [2] [10]. For example, at the group level, several different studies 
conclude that the expression probable is used to express a mean subjective probability 
in the range of .70-. 80, whereas improbable typically refers to a probability in the 
range of .12-. 20. However, a large variability has been found in individual numerical 
estimates of verbal probability phrases [16]. When alternatives are defined to be 
approximately equivalent, high probability terms can be used to characterize low- 
probability outcomes whereas such usage is less frequent when a dominant alternative 
is available. 

Another issue of particular interest is related to the verbal versus numerical 
presentation mode. Research on this topic exhibits contradictory results. On the one 
hand, no significant main effect (or very small differences) of the presentation mode 
of expressed probabilities has been found [3]. On the other hand, it is conceivable that 
information is processed differently when the final output is to be linguistic rather 
than numeric [18]. In addition, Zimmer [20] presented some data suggesting that 
linguistic information is actually processed more optimally in reasoning than direct 
numerical expressions, even if the tasks performed rely on frequency information. 
Interestingly, Zimmer used possibility theory as a framework for modeling the 
individual usage of verbal categories for grades of uncertainty. Also, it must be 
noticed that he obtained empirical evidence that, in verbal judgments, people are not 
prone to the conjunction fallacy. Zimmer's results agree with Raufaste and Da Silva 
Neves's finding that possibilistic and probabilistic predictions can be differentiated 




Polymorphism of Human Judgment under Uncertainty 65 1 



statistically for conjunction [11], However, It must be remembered that this result has 
not been found with disjunction. This last result can be explained a posteriori by 
contextual characteristics and by the properties of possibility measures. Indeed, 
Raufaste and Da Silva Neves’s experiment has been conducted with practitioner 
radiologists who had to provide and to rate plausible hypotheses about the pathologies 
suggested by the radiological film. As a consequence, the mean plausibility of 
subjective judgments was quite high (the mean values were around 80 with scales that 
ranged from 0 to 100). Now, recall that maximizing occurs always with possibility 
measures only, but not with certainty measures. Thus, it cannot be excluded that 
maximizing should occur with less plausible hypotheses. This hypothesis is 
strengthened by questions of relevance in the use of possibility and necessity 
measures. Indeed, intuitively, when some hypotheses to be evaluated are already 
known as plausible, reasoning in terms of degrees of possibility does not appear to be 
very informative. Conversely, when some hypotheses to be evaluated are already 
known as unlikely, reasoning in terms of degrees of necessity no longer appears to be 
very relevant. 

2.3 Objectives 

At least two questions emerged from this review of previous empirical results: 

(1) Are conjunctive and disjunctive judgments differently accounted by possibility 
theory and probability theory depending on whether judgments are made on a verbal 
or a numerical scale? 

(2) Is the degree of adjustment between the subjects’ judgments and both possibilistic 
and probabilistic models dependent on the plausibility of the hypotheses? 

The next section presents the experimental device which is constructed to explore 
these questions. 



3 Experiment 

In order to answer the questions above, an experiment was conducted where the 
subjects’ confidence judgments about direct, conjunctive and disjunctive hypotheses 
were compared to the values computed from the possibilistic and the probabilistic 
models. The latter were computed from the subjects’ confidence judgments. Then, the 
experiment dedicated to data collection, and the principles of data analyses are 
described. Next, the results of this experiment are presented and discussed. 

3.1 Subjects 

The subjects were 72 first-year psychology students at the University of Toulouse-Le 
Mirail, all native speakers in French. 



3.2 Material 

The material consisted of 3 short scenarios randomly presented in booklets. In each 
scenario (see Table 1), a detective inspector investigates a murder case and retains 
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Table 1. An example of the kind of scenario presented to the subjects (translated from the 
French original). 



Scenario 1 (condition -/-) 

A little bank of the district has been burgled. At the end of his investigation, the detective inspector focused 
on two suspects, well known to the police, Aurelien and Boris. However, both Aurelien and Boris had a 
good alibi: several persons attested they had seen them at the very moment of the burglary. In addition, the 
detective knew that it was not impossible that the burglar or burglars be someone else independent from 
Aurelien and Boris, but he knew no more. 



In addition, the material involved the following 5 questions: 

Q1 : To what extent do you believe that Aurelien is involved in the burglary? 

Q2: To what extent do you believe that Boris involved in the burglary? 

Q3: To what extent do you believe that Aurelien and Boris are both involved in the burglary? 

Q4: To what extent do you believe that Aurelien or Boris or both are involved in the burglary? 

Q5: To what extent do you believe that Aurelien or Boris but not both is involved in the burglary? 



two main suspects. The 3 scenarios differ mainly in the intuitive plausibility of the 
suspects’ culpability. In scenario 1, the two hypotheses are weakly plausible (-/- 
condition). In scenario 2, they are strongly plausible (+/+ condition). In scenario 3, 
one hypothesis is strongly plausible and the other is unlikely (+/- condition). 



3.3 Design and Procednre 

The subjects were informed that they had to read the three scenarios carefully, in the 
given order, and to answer to the 4 questions of each scenario (see above). They were 
also informed that they had to answer the questions in checking the point (in the 
verbal condition, the square) of the scale that best matched their judgment. Two kinds 
of scales were randomly assigned to subjects (with only one kind of scale by subject): 
a numerical scale and a verbal scale (see figure 1). 



3.4 Predictions and Analysis 

In order to study the effects of the response format (verbal versus numerical) and of 
the relative plausibility of both hypotheses on subjective judgments with regard to the 
possibilistic and the probabilistic conjunctive and disjunctive rules of composition, 
the subjects’ responses on the verbal scale were encoded in a numerical format and 
the subjects’ responses on the numerical scale were encoded in a verbal form 
according to the rules given in table 2. Numerical encoding values of the verbal 
response modalities were obtained in dividing a scale ranging from 0 to 1 in 6 
equivalent intervals between 0 and 1. The interval bounds provided the numerical 
modalities associated with the verbal modalities. The verbal encoding values of 
numerical responses were obtained in a more sophisticated manner. Indeed, to apply 
such an encoding makes sense if we suppose that when checking the verbal scale, 
subjects make use of some implicit metric representation so that a check on the left 
part of the square of a given verbal descriptor indicates a lesser degree of confidence 
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than a check in its right part. Moreover, following the same logic, some 
correspondence should exist between the bounds of the squares and corresponding 
values projected on the numerical scale. These values provided the bounds of the 
numerical intervals that encode verbal modalities. Because the comparison with direct 
numerical judgments supposes only one value by modality, the choice has been made 
to apply the same encoding rule as for ordinal modalities. 



Numerical scale 

0 % 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 



Verbal scale 



Impossible 


Not very 
possible 


Quite 

Possible 


Possible but 
not certain 


Probable 


Very probable 


Certain 



Fig. 1. Scales for the subjects’ answers 



Given the hypothesis that subject’s judgments conform to the possibilistic framework, 
basic combinatorial relations (see section 2.1.) should predict subjects’ estimates and 
it should be observed that: 

PI: When subject’s estimates of the conjunctive hypothesis (question 3) are over .5 
(entirely possible), under the independence hypothesis, the possibilistic model 
(N(AaB) = Min (N(A), N(B)) should fit better the subjects’ conjunctive judgments 
than the probabilistic model do (P(Ar^B) = P(A) * P(B)). 

Values of IJAaB) and P(AnB) are mean values. N(A) and P(A) have the same value, 
which is the mean value of the subjects’ answer to the question 1 (see table 1). N(B) 
and P(B) have also the same value computed from the answer to question 2. 

P2: When subject’s estimates of the conjunctive hypothesis (question 3) are over .5 
(entirely possible), under the non independence hypothesis, the possibilistic model 
(IXAaB) = Min (IIB/A), UA)) should fit better the subjects’ conjunctive judgments 
than the probabilistic model do (P(AnB) = P{A) * P{B/A)). 

TXA) and P(A) are computed according to the same rule as above. For the 
computation of P(B/A), the mean value of the subjects’ answer to the question 3 is 
divided by the mean value of the answer to the question 1. For the computation of 
/7^B/A), the mean value of the answer to the question 4 is subtracted to the mean 
value of the answer to the question 5. The same is made with respectively questions 4 
and 2. Then, the result of the first subtraction is divided by the result of the second 
one. 

P3: When subject’s estimates of the disjunctive hypothesis (question 4) are under .5 
(not entirely possible), the possibilistic model (IIAvB)=Max{IIA), U^B))) should fit 
better the subjects’ disjunctive judgments than the probabilistic model do (P(AuB) = 
P(A) + P(B) - P(AnB)). 

These tests have been made for each scenario given the two kinds of transformation 
applied to subjects’ judgments. In order to compute the fit of the models with the 
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subjects’ estimates, two kinds of statistical tests have been applied. The first is the 
computation of a correlation coefficient between the direct subjects’ responses and the 
predicted value. The second is the test of the equality of the mean values computed 
over the subjects’ responses on the one hand and computed over the predicted values 
on the other hand. The applied tests were the rho Spearman test of correlation and the 
Wilcoxon rank test (see Siegel and Castellan [14]). 

Table 2. Rules of encoding 



Numerical encoding of verbal measures Ordinal encoding of numerical measures 



1- Impossible 


^.00 


[0.14] 


^ 1 


^.00 


2- Not very possible 


^.16 


[.14 .28] 


^2 


^.16 


3- Quite possible 


^.33 


[.28 .42] 


— ^ 3 


^.33 


4- Possible but not certain 


^.50 


[.42 .58] 


^4 


^.50 


5- Probable 


^.66 


[.58 .72] 


— ^ 5 


^.66 


6- Very probable 


^.83 


[.72 .86] 


^6 


^.83 


7- Certain 


^1.00 


[.86 1.00] 


-^1 


^1.00 



3.5 Results 

Fifteen ANOVA (Analysis of Variance) have been computed for each of the 5 
questions crossed with each scenario in order to test a potential effect of the 
presentation order of the scenarios. No order effect has been found. In order to test the 
effect of the conjunctive and disjunctive hypotheses’ plausibility on the fit of the 
possibilistic and probabilistic models, two kinds of analyses have been made. The 
first one consisted in comparing the fit of the models with the subjects’ judgments for 
each scenario (C-/-, C+/- and C+/+, see section 3.2). In order to conduct the second 
kind of analysis, subjects first have been attributed two conditions. Pi’ (high 
plausibility) and PI. (low plausibility), according to the following criteria: (1) when 
plausibility judgments are under or equal to .5 (i.e. from impossible to possible but 
not certain), subjects have been attributed the PI. condition, (2) when plausibility 
judgments are over .5 (i.e. from probable to certain), subjects have been attributed the 
Pi’ condition. Next, the fit of the models with the subjects’ judgments under Pi’ and 
PI. have been compared for conjunction and disjunction according to the rules given 
in section 3.4. Table 3 summarizes the main results needed for the relevant 
comparisons. Of particular interest are the comparisons labeled Data/ 17 (vs. N) and 
Data/P, where Data represents the subjects’ judgments; 77(vs. N) represents the mean 
computed by the possibilistic model {N for min composition and 77 for max 
composition, and P represents the mean computed by the probabilistic model. The sig 
value (placed between brackets in the table) computed from the z coefficient 
(Wilcoxon test) represents the probability of rejecting by error the null hypothesis that 
the two means are equal. When the sig value is under the .05 level, the difference is 
judged significant. It is not so in the other case. In addition. The sig value (equally 
placed between brackets) computed from the rho coefficient of correlation (Spearman 
test) represents the probability of rejecting by error the null hypothesis that the two 
distributions are correlated. The same decision criterion is applied. In order to 
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conclude that one model fits the data better than the other one, it must be found that 
the rho coefficient value is significant at least for the former model (whatever the 
latter) and that the z value is not significant for the former and is significant for the 
latter. 

Table 3. Correlations (rho) and differences (z) between the subjects' judgments {Data) and 



possibility theory {N for conjunction and 77 for disjunction), and between Data and probability 
theory (P), under PI and PI,. N represents the sample size and sig represents the probability that 
the hypothesis of no difference between the mean values be rejected by error. 



Under the independence assumption 


Conjunction 


C-/- 


C-H/- 


C-H/-H 


P1‘ Data/N Z (sig.) 


-1.34 (.18) 


-3.43 (.001) 


-2.94 (.003) 


Rho (sig.) 


.81 (.18) 


.04 (.86) 


.44 (.12) 


Data/P Z (sig.) 


-.1.84 (.07) 


-3.73 (.000) 


-3.18 (.001) 


Rho (sig.) 


.81 (.18) 


.11 (.67) 


.32 (.26) 


N 


4 


18 


14 


Disjunction 


C-/- 


C-H/- 


C-H/-H 


PI. Data/n Z (sig.) 


-1.14 (.25) 


-2.38 (.02) 


-.75 (.45) 


Rho (sig.) 


.81 (.000) 


.57 (.000) 


.36 (.04) 


Data/P Z (sig.) 


-5.37 (.000) 


-4.71 (.000) 


-4.29 (.000) 


Rho (sig.) 


.81 (.000) 


.62 (.000) 


.52 (.002) 


N 


66 


39 


33 


Under the non independence assumption 


Conjunction 


C-/- 


C-H/- 


C-H/-H 


P1‘ Data/n Z (sig.) 


-.65 (.52) 


-.06 (.95) 


-2.37 (.02) 


Rho (sig.) 


.92 (.000) 


.83 (.000) 


.85 (.000) 


N 


68 


68 


68 


Data/P Z (sig.) 


-.1.19 (.24) 


-0.02 (.98) 


-2.51 (.012) 


Rho (sig.) 


-.28 (.24) 


.26 (.07) 


.37 (.004) 


N 


14 


51 


59 



The comparisons between scenarios of the mean values of the subjects’ judgments for 
elementary hypotheses have shown that the subjects endorsed the a priori plausibility 
of the hypotheses. 



Test of the Plausibility Level 

Examination of table 3 shows that, for disjunction, the possibilistic model fits the 
subject’s disjunctive judgments in C-/- and C+/+ scenarios but not in the C+/- 
scenario. The probabilistic model never fits the subjects’ data. These results are 
consistent with Raufaste and Da Silva Neves’s findings. Indeed, in the previous study, 
no model fit the subjects’ disjunctive judgments, but the latter were highly plausible 
(> .80). In the present study, the disjunctive hypothesis which significantly received 
the highest plausibility (C+/-: mean = .62; a = .25; the difference with the second 
more plausible hypothesis is significant at the .05 level) was precisely that which did 
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not fit any model. For conjunction, table 3 shows that under the independence 
assumption and under PI*, whatever the scenario, none of the models fits the subjects’ 
conjunctive judgments. Under the non independence assumption, the probabilistic 
model didn’t fit the data whatever the scenario, whereas the possibilistic model fits 
data in the C-/- and C+/-. The fact that the condition that is not fitted is the C+/+ one 
is consistent with the hypothesis that it applies better with the less plausible 
hypothesis. On the whole. These results suggest that the possibilistic model tends to 
fit the subjects’ judgments when they produce a low or only a “fair” plausibility. 

Table 4.. Correlations {rho) and differences (z) between the subjects’ judgments (Data) and 
possibility theory (N for conjunction and /7for disjunction), and between Data and probability 
theory (P), in the verbal and numerical conditions, sig represents the probability that the 
hypothesis of no difference between the mean values be rejected by error. N = 36 



Under the independence assumption 






Conjunction 




C-l- 


C-l-/- 


C-I-/-I- 


Verbal 

Data/N 


Z (sig.) 


-.05 (.96) 


-.40 (.69) 


-3.18 (.001) 




Rho (sig.) 


.76 (.000) 


.25 (.14) 


.37 (.02) 


Data/P 


Z (sig.) 


-5,05 (.000) 


-3.55 (.000) 


-2.17 (.03) 




Rho (sig.) 


.76 (.000) 


.26 (.13) 


.36 (.03) 


Numerical (n 
Data/N 


= 36) 

Z (sig.) 


-.21 (.83) 


-1.15 (.25) 


-.15 (.88) 




Rho (sig.) 


82 (.000) 


.41 (.01) 


.65 (.000) 


Data/P 


Z (sig.) 


-3.49 (.000)) 


-2.53 (.01) 


-3.16 (.002 




Rho (sig.) 


.82 (.000) 


.48 (.003) 


.67 (.002) 


Disjunction 




C-l- 


C-H/- 


C-H/-H 


Verbal 

Data/n 


Z (sig.) 


-.71 (.48) 


-.22 (.82) 


-.38 (.71) 




Rho (sig.) 


.86 (.000) 


.46 (.004) 


.51 (.002) 


Data/P 


Z (sig.) 


-4,55 (.000) 


-4.16 (.000) 


-4.54 (.002) 




Rho (sig.) 


.86 (.000) 


.42 (.01) 


.48 (.003) 


Numerical (n 
Data/n 


= 36) 

Z (sig.) 


-1.02 (.31) 


-.69 (.49) 


-1.81 (.07) 




Rho (sig. 


84 (.000) 


.65 (.000) 


.79 (.000) 


Data/P 


Z (sig.) 


-2.98 (.003) 


-1.73 (.08) 


-1.4 (.16) 




Rho (sig.) 


.84 (.000) 


.60 (.000) 


.83 (.000) 



Test of the Presentation Scale (Verbal vs. Numerical) 

The examination of table 4 shows that, for conjunction (computed only under the 
independence assumption), with the numerical scale, the possibilistic model fits the 
subjects’ judgments whatever the scenario, whereas it fits only the C-/- condition with 
the verbal scale. Data did not fit any model in the C+/- and C+/+ scenario. For 
disjunction, with the verbal scale, the possibilistic model fits the subjects’ judgments 
whatever the scenario, and not the probabilistic model. With the numerical scale, the 
possibilistic model fits the C-/- scenario, the probabilistic model fits the C+/+ 
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scenario, and both models competed for the fit with the C-/- scenario. These results 
exhibit an interaction effect between the kind of scale (verbal vs. numerical) and of 
the kind of judgment (conjunctive vs. disjunctive). With a verbal scale, the 
probabilistic model does not fit the subjects’ judgments, while with a numerical scale, 
it tends to fit the subjects’ judgments only for disjunction. On the other hand, 
whatever the kind of scale, the possibilistic model fits the subjects’ judgments for 
disjunction and only tends to do it for conjunction with a verbal scale. 



4 Conclusion 

The aim of this study was (1) to test if conjunctive and disjunctive judgments were 
differently accounted for possibility theory and probability theory depending on 
whether judgments are made on a verbal or a numerical scale, and (2) to test if the 
degree of adjustment between the subjects’ judgments and both possibilistic and 
probabilistic models dependent on the relative plausibility of the combined 
hypotheses. An experiment has been conducted in which 72 subjects had to rate in 3 
different scenarios the extent to which they believe that two characters were (i) each 
one, (ii) both, (iii) only one or both, (iv) only one, involved in an police case. The 
scenarios differed on the plausibility level of the hypotheses that each character be 
involved in the case. Our results appeared to be consistent with previous findings. In 
addition, it has been found that the possibilistic model tends to fit the subjects’ 
judgments when they produce a low plausibility. On the contrary, given additional 
results not presented here, probabilistic judgments tend to fit the subjects’ judgments 
when they produce a high plausibility. Moreover, an interaction effect between the 
kind of scale (verbal vs. numerical) and the kind of judgment (conjunctive vs. 
disjunctive) has been found. In particular, whatever the kind of scale, the possibilistic 
model fits the subjects’ judgments for disjunction but only tends to do it for 
conjunction with a verbal scale. Finally, the probabilistic model fits the subjects’ 
judgments with a numerical scale, but only for disjunction. These results remain to be 
explained, but emphasize the psychological plausibility of the possibilistic model. 
They suggest also that several models are needed in order to describe the rules that 
underlie the subject’s judgments formally. Probability theory is such a model. 
Because part of the subjects’ judgments have not been accounted for neither by 
possibility nor probability theory, further work should focus on other frameworks 
like, for example, the Dempster-Shafer one [see 13]. 
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Abstract. Psychological studies of reasoning with simple conditional 
arguments have shown that about one half of the participants do not 
consider the conclusion as certain when some specific information is 
added to the premises, explicitly or implicitly. This nonmonotonic effect is 
explained by generalising Mackie's [8] analysis of conditionals within the 
framework of Relevance theory (Sperber & Wilson [12]): Conditionals are 
uttered with a ceteris paribus assumption of normality; calling in question 
this assumption induces doubt in the conditional: This is what 

characterises additional premises used in the experiments mentioned 
above as well as in ordinary conversation. 



1 Introduction 

In the past twelve years or so, a number of psychological studies of deduction 
using essentially the simple arguments Modus Ponendo Ponens and Modus 
Tollendo Tollens have shown the following phenomenon. After the explicit or 
implicit addition of a piece of information to the premises by various 
experimental procedures (a few of which will be described succintly), the 
proportion of people who endorse the conclusion as certainly true is typically 
cut by one half, as compared to the usual rate of endorsement. This effect can 
be qualified as nonmonotonic as the major premise of the original argument 
operates like a general rule and the additional information like a 
specification leading to an apparent inconsistency, which results in 
participants' retractation. A general hypothesis on knowledge representation 
associated with conditionals will be outlined, on the basis of which a 
pragmatic explanation of the effect will be proposed. 
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2 Some Psychological Studies of NonMonotonic Reasoning 

2.1 Adding Premises Explicitly 

Byrne [2] asked one control group of participants to solve standard arguments 
such as, for Modus Ponendo Ponens (MP): tfs/ie meets tier /fiend, then she will 
go to a play; she meets her friend; therefore: (a) she will go to a play; (b) 
she will not go to a play; (c) she may or may not go to a play. As is commonly 
observed, over 95 percent of the participants chose option (a). An 
experimental group was asked to solve the same arguments modified by the 
addition of a third premise, if she has enough money, then she will go to a 
play. The result is that fewer than 40 percent in this group chose option (a). A 
similar effect was observed with Modus Tollendo Tollens (MT). Notice the 
special structure of the argument: the third (additional) premise was a 
conditional that had a necessary condition in its antecedent; since it had the 
same consequent as the major premise, it contained, in fact, a necessary 
condition for the consequent of the major premise and served as a means of 
introducing it in the context. 

Chan & Chua [3] introduced a refinement on Byrne's [2] paradigm. Using 
various non causal conditional rules as premises, for each of them they 
defined three necessary conditions for the consequent to hold; these conditions 
varied in importance (that is, in degree of necessity independently estimated 
by judges). For example, with a MP whose major premise was If Steven is 
invited then he will attend the party the three degrees of necessity were 
introduced each time by an additional premise: If Steven knows the host well 
then he will attend the party (low degree), or If Steven knows at least some 
people well then he will attend the party (intermediate degree), or If Steven 
completes the report to night then he will attend the party (high degree). 
The response options were (a) he will attend the party; (b) he will not attend 
the party; (c) he may or may not attend the party. It was observed that the 
rate of (a) answers to these three-premise arguments was a decreasing 
funchon of the degree of necessity. Similar results were obtained for MT. In 
brief, the statement of an additional conditional premise which contained in 
its antecedent a necessary condition for the consequent to occur diminished the 
rate of endorsement of the conclusion all the more sharply as the condition 
was rated as more important. 

Stevenson and Over's [13] first experiment had two controls and five 
experimental conditions. The first control was a standard argument, e. g. (for 
MP), If John goes fishing, he will have a fish supper; John goes fishing whose 
conclusion was evaluated on a five-option scale: John will have a fish supper; 

. . will probably have. . ; . . may or may not have. . ; probably won 't have. . ; 
won't have. . . The second control was similar to Byrne's [2] experimental 
condition: There was a third premise, a conditional whose antecedent was a 
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necessary condition for the consquent of the major premise: if John catches a 
fish, he will have a fish supper. The five experimental conditions had a 
fourth premise that informed the participant about the likelihood of the 
satisfaction of the necessary condition in the third premise: John is always 
lucky; . . almost always. . ; . . sometimes. . ; . . rarely. . ; . . very rarely. . 
While in the second control condition Byrne's results were replicated, the 
effect of the fourth premise on both MP and MT was to decrease the rate of 
endorsement of the conclusion and correlatively to increase the uncertainty 
ratings on the five-point scale in a near-monotonic fashion across conditions. 
In brief, the manipulation of degrees of necessity resulted in functionally 
related degrees of belief in the conclusion of the arguments. 

In their second experiment the same authors used three-premise arguments 
in which the second premise was a categorical sentence that introduced 
various levels of frequency directly into the necessary condition. For example, 
given the major premise If John goes fishing, he will have a fish supper, 
there were five levels in the second premise: John always catches a fish when 

he goes fishing; . . almost always. . ; . . sometimes. . ; . . almost never. . ; . . 
never. . . For both MP and MT the rate of endorsement of the conclusion 
decreased monotonically as the frequency mentioned in the second 
(categorical) premise decreased (with a floor effect on the two smallest 
frequencies). In brief, the denial, and the explicit introduction of various 
degrees of doubt in the satisfaction of a condition necessary for the consequent 
to occur diminished the rate of endorsement of the conclusion and the greater 
the doubt, the greater the decrease. 

Manktelow and Fairley [9] manipulated the extent to which a necessary 
condition is satisfied: with a low degree of satisfaction the consequent is less 
likely to occur and with a high degree it is more likely to occur. The control 
argument was a standard MP with the major premise If you pass your exams, 
you will get a good job and there were four experimental arguments made of 
this MP augmented with one of the following premises: (i) got very low grade; 
(ii) got low grade; (iii) got respectable grade; (iv) got excellent grade. The 
conclusion had to be assessed on a 7-point scale (from very low to very high 
certainty to be offered a good job). For the first two experimental arguments 
the certainty ratings were below the control (and lower for the very low grade 
condition than for the low grade condition). For the last two, the certainty 
ratings were above the control (and higher for the excellent grade condition 
than for the respectable grade condition). In brief, they found that the degree 
of certainty of the conclusion was an increasing function of the degree to which 
a necessary condition is satisfied. 

Politzer & Bourmaud [11] used five different MT arguments such as If 
somebody touches an object on display then the alarm is set off; the alarm was 
not set off; conclusion: nobody touched an object on display (to be evaluated on 
a five-point scale ranging from certainly true to certainly false). This was a 
control; in the three experimental conditions, degrees of credibillity in the 
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conditional were defined by way of an additional premise that provided 
information on a necessary condition for the consequent to occur: High 
credibility: there was no problem with the equipment; Low: there were some 
problems with the equipment; Very low: the equipment was totally out of 
order. The coefficients of corrrelation between level of credibility and belief 
in the truth of the conclusion ranged between .48 and .71 and were highly 
significant. This result provides a wide generalisation for the previous 
investigations to the extent that (i) the kind of rule used was not limited to 
causals but included also means-end, remedial, and decision rules, (ii) there 
were several degrees of credibility, (iii) the major and minor premises were 
kept constant across levels of credibility, and (iv) the format of evaluation of 
the conclusion was sensitive enough to enable the expression of various degrees 
of belief. 

In summary, the studies reviewed so far show that with simple conditional 
arguments (MP or MT) a majority of people become less certain of the 
conclusion, and consequently are reluctant to rate the conclusion as true, when a 
premise which has the following property is added: That premise questions 
the truth of a condition which (i) is necessary for the consequent of the major 
conditional premise to be true, and (ii) at the same time, must be assumed to be 
true if the major conditional is to be credible. The next few studies differ 
slightly in that they show that similar effects can be obtained without 
explicit additional information. 

2.2 Adding Premises Implicitly 

Studies by Curnrnins [4] and Cummins, Lubart, Alksnis, and Rist [5] were 
focused on MP and MT arguments with causal conditionals. They demonstrated 
that the acceptance rate of the conclusion was a decreasing function of the 
number of disabling conditions available, that is, conditions whose 
satisfaction is sufficient to prevent an effect from occurring (and whose non 
satisfaction is therefore necessary for the effect to occur). For example, of the 
following two MP arguments, ff f/ze match was struck, then it lit; the match 
was struck / it lit and If Joe cut his finger, then it bled; Joe cut his finger / i t 
bled, people are less prone to accept the conclusion of the first, which has 
many disabling conditions, than the conclusion of the second, which has few. 
Thompson [14] obtained similar results with causals and also non causal rules 
such as obligations, permissions and definitions by using conditionals that 
varied in 'perceived sufficiency' (independently rated by judges). A sufficient 
relationship was defined as one in which the consequent always happens 
when the antecedent does. It was observed that the endorsement rate of the 
conclusion was an increasing function of the level of sufficiency. This 
manipulation can also be described by saying that the conditional premises 
differed by the number of necessary conditions, whether negative (disabling 
conditions) or positive (called 'enabling' conditions, that is, conditions whose 
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satisfaction is necessary for an effect to occur). Clearly these conditions are 
less likely to be all satisfied when this number is high than when it is low, 
hence the difference in the acceptance rate of the valid conclusion, assuming 
that participants in these experiments were aware of this fact. 

George [7] manipulated directly the credibility of the conditional premise 
of MP arguments. Two groups of participants received contrasted instructions. 
One group was asked to assume the truth of debatable conditionals such as If a 
painter is talented, then his/her works are expensive but the other group was 
reminded of the uncertain status of such statements. While 60 percent in the 
first group endorsed the conclusion of at least three of the four MP arguments, 
only 25 percent did in the second group. By asking to asssume the truth of such 
conditionals, participants were invited to dismiss possible objections 
(necessary conditions) like the painter must he famous, whereas stressing the 
uncertainty of the statement is a way to invite them to take such objections 
into account. 

Newstead, Ellis, Evans, and Dennis [10] and Evans and Twyman-Musgrove 
[6] studied MP and MT arguments whose major conditional premise differed 
from the point of view of the 'speech act' they conveyed; they observed 
differences in the rate of endorsement of the conclusion: promises and threats 
on the one hand, and tips and warnings on the other hand seem to constitute 
two contrasted groups, the former giving rise to more frequent endorsements of 
the conclusion than the latter. The authors note that the key factor seems to 
be the extent to which the speaker has control over the occurrence of the 
consequent, which is higher for promises and threats than for tips and 
warnings. Weaker control implies greater difficulty to ensure the satisfaction 
of the necessary condition for the consequent to occur, hence less certainty that 
it will follow. 

3 The Representation of Conditionals in Relation with the 
Knowledge Base 

The following three related claims are made: 

(i) conditionals are uttered in a backgroimd knowledge, of which they 
explicitly link two units (the antecedent and the consequent), keeping 
implicit the rest of it, which will be called a conditional field; 

(ii) the conditional field has the structure of a disjunctive form, as proposed 
by Mackie [8] for causals. The mental representation of a conditional if A then 
C (excluding analytically true conditionals) in its conditional field can be 
formulated as follows : 



[ (^m &■ ■ ■ & Aj & A) V (Bn &. . . & & B) y ■ ■ ] “> C 



( 1 ) 
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A is the antecedent of the conditional under consideration; B is an 
alternative condition that could justify the assertion of if B then C in an 
appropriate context. Although such alternative antecedents may play an 
important role, B and its conjuncts will not be considered further here. We focus 
on the abridged form. 



(Am &. . . & Ai & A) -> C (2) 

While (Am &. . . & Ai& A) is a sufficient condition as a whole, each 
conjunct Am / • • / A^ separately is necessary with respect to A. These conjuncts 
will be called complementary necessary conditions (henceforth CNC). 

(hi) it is hypothesised that in asserting the conditional if A then C, the 
speaker assumes that the necessity status of the conditions Am , ■ ■ , A^ is part 
of the shared knowledge, and most importantly that these conditions are 
satisfied. 

This is justified on the basis of relevance. According to relevance theory 
(Sperber and Wilson [12] ), in uttering the conditional sentence, the speaker 
guarantees that the utterance is worth paying attention to, that is, it will 
enable the hearer to derive a new piece of knowledge. But this in turn requires 
that the CNC's be satisfied, failing which the conclusion would not follow. 
The assumption of satisfaction of CNC's can be characterised as an epistemic 
implicature. In brief, conditionals are typically uttered with an implicit 
ceteris paribus assumption to the effect that the normal conditions of the 
world (the satisfaction of the CNC's that belong to shared knowledge) hold. 

An important consequence is that if further information denies or just raises 
doubt on the assumption of satisfaction of the CNC's, (technically, the 
cancellation of an implicature), the conditional sentence no longer conveys a 
sufficient condition. 

In terms of processing cost, the epistemic implicature attached to the 
utterance of the conditional has the advantage that in normal circumstances 
there is no need to explore the knowledge base in order to check whether all 
CNC's are satisfied: their satisfaction is assumed or guaranteed by the 
speaker (to the best of her knowledge). But if the hearer has reasons to be 
cautious about a conditional, he will search in the conditional field for non 
satisfied CNC's. Such reasons are t 5 q)ically based on the level of confidence of 
the hearer in the speaker, on the possible existence of alternative sources of 
information which the hearer believes to be unknown to the speaker, etc. 

The experimental data reviewed earlier can now be explained by the 
following common mechanism. These manipulations amoimt to intoducrng a 
CNC in the context together with a degree of belief in it; this introduction 
often comes explicitly in the form of a third premise [2], [3], [9], [11], [13]. The 
satisfaction of the CNC can be denied ]11] or a doubt about a CNC can be 
expressed explicitly [13] or implicitly [9]; in the latter case, it can be 
expressed through an implicature triggered by the use of an additional 
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conditional whose antecedent precisely is a CNC [2], [3]; sometimes it is the 
result of a search in the conditional field triggered by the instructions, or more 
generally, the representation of the task [4], [5], [6], [7], [10], [14]. Recall that 
CNC's (which are necessary conditions for the consequent to occur) complement 
the antecedent of the conditional to make it an actual sufficient condition. 1 1 
follows that the degree of belief in the satisfaction of those CNC's acts as a 
mediator for the credibility of the conditional (and subsequently, by 
inheritance, for the degree of belief in the conclusion of the argument). The 
truth status of the conclusion is then treated by degree rather than in an a 1 1 - 
or-nothing manner and this degree is closely correlated to the degree of belief 
in the conditional premise. 

Consider, for instance. If somebody touches an object on display then the 
alarm is set off. That the equipment be in working condition is one of the 
CNC's for the alarm to be set off, a condition whose satisfaction is implicitly 
warranted by the speaker (say, a technician who has just revised the 
equipment). There is no doubt in the conditional as long as a doubt in the CNC 
is not expressed, explicitly or implicitly. This example shows that belief in 
the conditional crucially depends on the belief in the satisfaction of a CNC, 
and in particular, doubt in the conditional is an increasing function of the 
doubt about the satisfaction of the CNC. If there is a doubt about the 
equipment's being in working condition, that somebody touches an object cn 
display can no longer be a sufficient condition for the alarm to be set off, 
which is why upon hearing about the state of the equipment one may 
withdraw full belief in the conditional. In order to restore full belief in it, the 
antecedent would have to be complemented with the necessary condition, the 
equipment is in working condition, yielding: If somebody touches an object on 
display and the equipment is in working condition, then the alarm is set off. 
In case there is a doubt about this condition, the conclusion of the MP, the 
alarm is set off is uncertain and it inherits the degree of belief in the 
equipment's being in working condition, while the conclusion of the MT, 
nobody touched an object on display, knowing that the alarm was not set off, 
is also uncertain. 

It is remarkable that in all the experiments reported participants are split 
into two groups: an (often strong) minority endorse the conclusion of the 
argument while a majority do not. In the former case, they seem to have a 
standard logical understanding of the premises: the major conditional premise 
is understood as conveying a sufficient condition and the additional 
information is disregarded. In the latter case, the additional information, 
which has the status of a necessary condition, is treated as if its fulfilment 
were not warranted; consequently the major conditional premise does not 
convey a sufficient condition any more. 
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4 Conclusion 

The nonmonotonic effects observed in experiments on reasoning from 
conditional premises, namely the reluctance to endorse the conclusion or the 
expression of a doubt about it result from the addition of a premise whose 
communicated meaning questions the satisfaction of what has been called 
earlier a complementary necessary condition, (that is a tacit condition whose 
satisfaction is necessary for the consequent to occur), and therefore for the 
antecedent to be regarded as sufficient. In investigating the nature of the 
credibility of conditionals, researchers have traditionally focused their 
attention on the relation between antecedent and consequent; the present 
approach shows that there is advantage in going one step further, analysing 
the structure of the knowledge base (the conditional field). 

The widely shared view [1] that the credibility of if A, then C is measured 
by the conditional probability of the consequent on the antecedent p(C/ A) is a 
global approach which is entirely compatible with the analytic approach 
taken here: it can easily be demonstrated (but space is lacking here) that 
when the satisfaction of a CNC hitherto implicitly assumed becomes 
questioned, p(C/ A) decreases. In brief, doubting about a conditional is due to a 
doubt about a CNC, and the doubt in the former is an increasing function of the 
doubt in the latter. 

A last point worth noticing is that in human commimicahon the existence 
of possible defaults (the non satisfaction of CNC's) does not necessitate the 
inspection of the knowledge base; the 'burden of the proof does not concern 
normality, but rather abnormality: as noticed earlier, because of the 
guarantee of relevance, a conditional is normally accepted and the knowledge 
base is inspected only if there are good reasons to do so. 
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Abstract. Argumentation is a natural form of reasoning, in which two agents 
cooperate in order to establish the validity of a given argument that could be 
used to deduce some conclusion of interest. An interesting semantics for logical 
systems of argumentation is Dung’s “preferred semantics”, which ameliorates in 
some ways the better-known stable semantics. In this paper, we present proof 
theories for the credulous decision problem associated with the preferred se- 
mantics: is a given argument in at least one extension of a given argumentation 
framework? Our proof theories improve on the one by [VPOO], in the sense that 
a proof for a given argument is usually shorter with our system. 



1 Introduction 

Argumentation is a natural form of reasoning, in which two agents cooperate in order 
to establish the validity of a given argument that could be used to deduce some con- 
clusion of interest. During the process of argumentation, each agent forms and asserts 
arguments that contradict or undermine arguments proposed by the other agent. This 
dialogue normally goes on until one of the agents cannot reply anything new: the 
original argument is then considered valid or not, depending on which agent won the 
dispute, the proponent or the opponent. The connection between dialogue and argu- 
ment games has been studied by many researchers (see e.g. [CMLOO] for further ref- 
erences). The formalization of this form of reasoning has recently captured the interest 
of many researchers in the Artificial Intelligence community. For example, logics of 
argumentation are used in the construction of systems for legal reasoning, collective 
decision making or negotiation. 

Besides being interesting because they capture this natural form of reasoning, logics 
of argumentation have also turned out to generalize in some way non-monotonic 
logics (that had themselves been designed as extensions of classical logic that do not 
collapse in the presence of inconsistencies). The first formulation of a decision prob- 
lem related to Reiter’s default logic in terms of a dialogue between two agents is 
probably due to Poole [Poo89]. Similar ideas have been used for building theorem 
provers for circumscription [Prz89,Gin89]. The close connection between non- 
monotonic logics and argumentation has been formally established in [BDKT97]. 
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In both non-monotonic logics and logics of argumentation, one has to evaluate a set 
of pieces of knowledge (defaults or arguments) that can contradict each other. The 
computation of these contradictions can normally be performed by some classical 
theorem prover. Then, the evaluation of the defaults or of the arguments can often be 
based solely on the contradiction graph, where the vertices are the defaults or argu- 
ments, and where the directed edges represent the contradictions. One way to formal- 
ize this evaluation is to define acceptable sets of pieces of knowledge: usually, one 
would say that acceptable sets do not contain any contradiction, and are “strong 
enough” in some sense. 

The most widespread definition of acceptability associated with non-monotonic 
logics or logic programs considers that the acceptable sets are the “stable extensions”, 
which correspond to kernels of the contradiction graph [DMP97, Ber73]. However, 
the stable semantics has some features that can be undesirable in some contexts: nota- 
bly, it can happen that no set of pieces of knowledge is stable. [Dun95] defines an 
abstract framework for studying argumentation, and proposes several semantics. In 
particular. Dung’s preferred semantics seems to capture well the intuition of “strong 
enough”, and avoids several drawbacks of the stable semantics. In the preferred se- 
mantics, acceptable sets of arguments are called “preferred extensions”. 

An important problem related to argumentation systems is the credulous decision 
problem: given an argument and an argumentation framework, is the argument in at 
least one acceptable set of arguments? Proof theories for that problem have been de- 
scribed by [KT99] for the preferred semantics but for a slightly different form of ar- 
gumentation systems. Proof theories in the form of argument games have also been 
described for the grounded semantics by [PS97, ACOO], and by [VPOO] for the pre- 
ferred semantics. [VPOO] propose argument games to answer credulous queries, and 
also to answer sceptical queries in a particular case. In these argument games, proofs 
that an argument belongs to some extension of an argumentation framework have the 
form of dialogues between a proponent and an opponent. An important aspect of such 
proofs is that they give an easy way to understand the implications of the underlying 
notions of acceptability. 

On the algorithmic side, [DM01] show how to optimize the enumeration of the sub- 
sets of the set of arguments in order to efficiently answer questions related to the pre- 
ferred semantics: what are the extensions, is an argument in all or some extensions? A 
careful study of the algorithm designed to answer the credulous decision problem 
shows that it can also be seen as a dialogue between an opponent and a proponent. 
Moreover, it appeared to us that the dialogue performed by the algorithm seems to be 
usually shorter than the proofs as defined by [VPOO]. This prompted us to design an- 
other proof theory for the credulous decision problem related to the preferred seman- 
tics. 

We present below two proof theories for that problem, based on the dialectical 
framework of [JV99]. They both improve on the one by [VPOO] in the sense that 
proofs for a given argument are usually shorter with our system. 

The paper is built as follows: the next section presents Dung’s basic definitions and 
the preferred semantics. Section 3 presents a general framework for defining dialecti- 
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cal proofs. Our proof theories for the credulous decision problem associated with the 
preferred semantics are defined in Sect. 4. We show in Sect. 5 that the algorithm in 
[DM01] can compute our proofs, and we explain why they are usually shorter than the 
ones by [VPOO]. Proofs of the propositions are available in [CDMOl]. 



2 The Preferred Semantics 

Definition 1. [Dun95] An argumentation framework is a pair (A,R) where A is a set 
of arguments and R is a binary relation over arguments, i.e. R c AxA. Given two 
arguments a and b, (a,b) e R or aRb means a attacks b (a is said to be an attacker of 
b). Moreover we say that a set S c A of arguments attacks an argument a if some 
argument b in S attacks a. An argument a e R is self-attacking if (a,a) e R. The set of 
the self-attacking arguments of (A,R) is denoted by Refl(A,R) (or Refl for short). 

An argumentation framework can be simply represented as a directed graph where 
vertices are the arguments and edges correspond to the elements of R. Figure 1 shows 
an example of argumentation framework. Given an argument a e A, we denote the set 
of the successors of a by R*(a) = jb e A I (a,b) e R], the set of its predecessors by R' 
(a) = jb e A I (b,a) e R), and the set R*(a) u R(a) by R*(a). Moreover, given a set S 
c A of arguments and e e j-i-, -, ±], R'(S) = j, R'(a). 



b ► a 



Fig. 1. The graph representation of an argumentation framework called AFl 

Definition 2. Let (A,R) be an argumentation framework. An argument a £ A is de- 
fended by a set S c A of arguments (or S defends a) if and only if Vb £ A, if bRa then 
S attacks b, i.e. 3c £ S such that cRb. A set S c A is conflict-free if and only if there 
are no arguments a and b in S such that a attacks b. A set S c A is admissible if S is 
conflict-free and Vx £ S, S defends x. 

Dung defines the preferred semantics of an argumentation framework by the set of 
preferred extensions. Precisely, a preferred extension is a maximal (with respect to set 
inclusion) admissible set of arguments. We characterise the preferred extensions on 
the graph representation of the argumentation framework. 

Proposition 1. Given an argumentation framework (A,R), a subset S of A is a pre- 
ferred extension if and only if the following conditions hold: 1) R*(S) n S = 0 (S is 
conflict-free); 2) R(S) c R*(S) (S defends every element it contains); 3) for every 
non-empty X c A \ S, X n R^(S u X) ^ 0 or R'(S) <Z R*(S u X) (S is c-maximal 
such that 1 and 2). The set of the preferred extensions of (A,R) is denoted by 
Pref(A,R). 
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Dung exhibits interesting properties of the preferred semantics: in particular, every 
admissible set is contained in a preferred extension and every argumentation frame- 
work possesses at least one preferred extension. 

The purpose of this paper is to answer an important question on preferred exten- 
sions: given an argument and an argumentation framework (A,R), is the argument in 
at least one preferred extension of (A,R), or equivalently, is the argument a credulous 
consequence of (A,R)? We define formally this notion: 

Definition 3. Given an argumentation framework (A,R) and an argument a £ A, a is a 
credulous consequence of (A,R) under the preferred semantics if and only if a is con- 
tained in the union of the preferred extensions of (A,R). 

Example. Given AFl, argument a is defended by S={a,d,e} against b. S is conflict- 
free and defends all its elements, so it is an admissible set, just like the sets 0, {d}, 
{e}, {a,d}, {d,e} and {b,e}. The preferred extensions of AFl are {a,d,e} and {b,e}. a, 
b, d and e are credulous consequences of AFl under the preferred semantics, c is not. 

Since a preferred extension is a maximal admissible set, deciding if an argument a 
is contained in a preferred extension of (A,R), amounts to deciding if it is contained in 
an admissible set. A procedure to solve this decision problem can take the form of a 
game between two players, one trying to build an admissible set containing a (the 
proponent), the other one trying to show it is not possible (the opponent). 

Example. We want to decide if a is a credulous consequence of AFl. The proponent 
starts trying to build an admissible set containing a by advancing a. The opponent says 
that a is attacked by b. The proponent defends a by advancing c. But the opponent 
replies that c is attacked by e. The proponent cannot defend c against e, so he advances 
d another defender of a. The opponent has nothing to say since d is not attacked. The 
proponent has built an admissible set: {a,d}, so a is a credulous consequence of AFl. 



proponent 




a 

1 




opponent 




/K 




proponent 






d 


opponent 


1 

e 







Fig. 2. Argument game to decide if a is a credulous consequence of AFl 



3 The Dialectical Framework 

Our purpose is to define a dialectical proof theory for the credulous preferred seman- 
tics, which takes into account the ideas of the credulous query answering algorithm of 
[DM01]. A general method for answering such a query takes the form of an argument 
game between a proponent (PRO) and an opponent (OPP). The proponent starts with 







672 



C. Cayrol, S. Doutre, and J. Mengin 



the argument to be tested, and attempts to defend that argument against any attack 
coming from the opponent. The precise rules of the argument game depend on the 
semantics to be captured. 

Argument games have been formalised in [JV99], where a general framework is 
proposed which enables to define dialectical proofs for winning positions in argu- 
mentation frameworks. The formalism developed by [JV99] encompasses the argu- 
ment games of [PS97] and provides proof theories for two interesting semantics : the 
"robust" and the "defensible" semantics. 

Following the methodological approach of [VP99], but with slightly different defi- 
nitions, we propose in this section a dialectical framework which will enable us to 
provide two original proof theories for the credulous preferred semantics, in section 4. 



3.1 Dialogue Type 

An argument game is formalised by a dialogue between two players PRO and OPP. A 
dialogue takes place in a given argumentation framework and is governed by rules 
expressed in a so-called "legal-move" function. 

Definition 4. A move in A is a pair [P,X] where P £ {PRO, OPP} and X £ A. For a 
move m = [P, X], we use pl(m) to denote P and arg(m) to denote X. A dialogue type is 
a tuple (A,R,0) where (A,R) is an argumentation framework and O : A* -> 2* is a 
function^ (called "legal-move" function). A dialogue d in (A,R,0) (or <F-dialogue for 
short) is a countable sequence m„mj,. . . of moves in A such that : 

(i) pl(m„) = PRO 

(ii) pKm^) • pKm^^j) 

(hi) arg(mj^j) £ 0(arg(m„) ...arg(mj)) 

d is about the argument arg(m„). 

So, each player plays in turn and PRO plays first. The next move is legal with re- 
spect to the preceding moves. 

Remark. In [JV99], a conflict-free set of arguments may appear in a move. However, 
an additional requirement is that arg(m;^,) must attack arg(m|). 

Notations. Let d = m^m^,. . . mj a finite <&-dialogue. 
m^ is denoted by last(d). 

<h(arg(mj ...arg(mj)) is denoted by <F(d). 

PRO(d) will denote the set of arguments advanced by PRO during d. 

Let m be a move in A such that m^m^,. . . mm is a O-dialogue. The extension of d with 
m is denoted by the juxtaposition d m. 

Any restriction can be included in the "legal-move" function as for instance: an ar- 
gument advanced in a move attacks the argument advanced in the previous move; 
PRO cannot repeat himself; no player can introduce a self-attacking argument. 



' A* denotes the set of finite sequences of elements from A. 
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3.2 Proof Theory 

As in any game, we must give winning criteria in order to determine which argument 
can be successfully defended with O-dialogues. We consider two criteria among those 
proposed by [JV99]. A given dialogue about an argument x can be won, or there can 
be a winning strategy for x, that is a way for PRO to defend x against all attacks of 
OPP. 

Definition 5. Let d be a O-dialogue. d is won by PRO iff d is finite, cannot be contin- 
ued (that is O (d)= 0), and last(d) is played by PRO. 

The next definition is simpler but equivalent to the one proposed in [JV99]. 

Definition 6. A O-winning strategy for x is a non-empty finite set S of finite O- 
dialogues about x such that : V d e S, V d’ prefix^ of d such that last(d’) is played by 
PRO, V y e 0(d’), 3 d" e S such that d" is won by PRO and d" is an extension of 
d’[OPP,y]. 

The following result provides another characterization of O-winning strategies in 
which only dialogues which cannot be continued are considered. 

Proposition 2. There exists a O-winning strategy for x iff there exists a finite non- 
empty set S of finite <&-dialogues about x won by PRO such that : V d e S, V d’ prefix 
of d such that last(d’) is played by PRO, V y e <h(d’), 3 d" £ S such that d" is an ex- 
tension of d’[OPP,y]. 



4 Proof Theories for Credulous Preferred Semantics 



The combination of a dialogue type and a winning criterion determines a proof theory. 
In this section, we present two specific proof theories dedicated to the credulous deci- 
sion problem for the preferred semantics. The problem is to decide if an argument x 
belongs to a preferred extension. The basic idea is to prove that an admissible set of 
arguments can be built around x, with appropriate strategies for choosing attackers and 
defenders of the argument x. So, the "legal-move" functions we propose are inspired 
by the strategies used in the [DM01] algorithm. 

Let d be a finite <&-dialogue. R'‘(PRO(d)) contains the arguments which attack or 
which are attacked by an argument avanced by PRO during d. Since PRO attempts to 
build an admissible set of arguments, PRO cannot choose any argument in R'‘(PRO(d)) 
for pursuing the dialogue d. Nor any self-attacking argument. Let POSS(d) denote the 
set of arguments which may be chosen by PRO for extending the admissible set 
PRO(d). Formally, POSS(d) = A \ (PRO(d) u R"(PRO(d)) u Refl). 

The role of OPP is to attack one of the previous arguments advanced by PRO in d. 
But it is useless for OPP to advance an argument which is attacked by PRO(d). 



^ The sequence y is prefix of the sequence x, or x is an extension of y iff there exists a sequence 
z such that x is obtained by the concatenation of y and z, x = yz. 
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4.1 The 01 -Proof Theory 

Definition 7. Let 01 : A* -> 2^ defined by : 

If d = 01(d) = R'(PRO(d)) \ R"(PRO(d)) 

If d = 01(d) = R'(arg(m,j^j)) n POSS(d) 

Combining 01 and the first winning criterion, we obtain 01 -proofs. 

Definition 8. A 01 -proof for the argument x is a 01 -dialogue about x won by PRO. 

The following results establish the soundness and the completeness of the 01 - 
proof theory. 

Proposition 3. (Soundness) If d is a 01 -proof for the argument x, then PRO(d) is an 
admissible set containing x. 

Proposition 4. (Completeness) If the argument x is in a preferred extension of the 
argumentation framework (A,R), then there exists a 01 -proof for x. 

Example. Let AF2 as indicated on figure 3. Let us try to build a 01 -proof for a. PRO 
plays a. OPP can respond with b, c or d, since these arguments are predecessors but 
not successors of a. Assume OPP responds with b. This argument has two attackers: i 
and j, which can be advanced by PRO. Assume PRO advances i. Since R'({a,i}) \ 
R^({a,i})={c}, OPP can only advance c (Note that c attacks a). Then PRO responds 
with f. OPP cannot play anymore, since R'({a,i,f}) \ RX{a,i.f})=0- The dialogue can- 
not be continued, it is won by PRO, thus it constitutes a 01 -proof for a. 



4.2 The 02-Proof Theory 

In order to compare our work with argument games, it is convenient to present proofs 
in a more traditional way, where at each stage of the proof, the advanced argument 
attacks the previous one. Such proofs are obtained via the following dialogue type. 

Definition 9. Let 02 : A* -> 2^ defined by : 

If d = m„mj,. . . m,. 02(d) = R( arg(m,j)) \ R"(PRO(d)) 

If d = m„mj,... m,j^j 02(d) = R'(arg(m 2 i^j)) n POSS(d) 

02 is a restriction of 01 since, according to 02, OPP must advance an argument 
which attacks the argument advanced in the previous move. 

Combining 02 and the second winning criterion, we obtain 02 -proofs. 

Definition 10. A 02 -proof for the argument x is a 02 -winning strategy S for x such 
that u(PRO(d), ds S) is conflict-free. 

Both proof theories are equivalent as shown by the following result. 

Proposition 5. There exists a 01 -proof for the argument x iff there exists a 02 -proof 
for X. 
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Example. Given AF2, let us try to build a 02-proof of a. PRO plays a. a has three 
attackers: b, c and d. Thus we have three 02-dialogues about a: 

• in the first one (dl), OPP responds to a with b. Then PRO can advance i or j. As- 
sume he advances i. Then OPP cannot respond since R'({i}) \ RX{a,i})=0. 

• in the second one (d2), OPP responds to a with c. Then PRO can only advance f. 
OPP cannot respond since R'({f}) \ R’"({a,f})=0- 

• in the third one (d3), OPP responds to a with d. Then PRO can advance i or j. As- 
sume he advances i. Then OPP cannot respond since R'({i}) \ RX{a,i})=0. 

These three dialogues are finite, cannot be continued and are won by PRO. S = {dl, 
d2, d3 } is a 02 -winning strategy for a. u(PRO(d), ds S) = { a, i, f } is conflict-free. 
So S is a 02-proof for a. 
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Fig. 3. The argumentation framework AF2, a Ol -proof and a <t>2-proof for a 



5 Related Works 

5.1 The Credulous Query Answering Algorithm of [DM01] 

We have proposed in [DM01] an algorithm to answer credulous queries related to the 
preferred semantics, of the form: given an argumentation framework (A,R) and an 
argument a in A, is a in at least one preferred extension of the framework? We have 
already mentionned that what prompted us to investigate a new proof theory is the fact 
that the proofs computed by our algorithm are in general shorter than the proofs of 
[VPOO]. In this section, we show that our algorithm computes Ol-proofs. 

Let us first describe our algorithm. It is based on an enumeration of the subsets of 
A, which is performed by exploring a binary tree. Each node of the tree is labelled 
with a partition (1, O, U) of A: I is the set of arguments that have been put so far In the 
extension being built, O is the set of arguments that have been put Out of it, and U is 
the set of arguments that are still undecided at that stage. Since a preferred extension 
must be conflict free, the tree explores only nodes such that R*(I)cO; in this case we 
say that (I,0,U) is an R-candidate. Normally, a node labelled with an R-candidate 
C=(I,0,U) such that U*0 has two children: one is labelled with Ch-x=(Iu{x], 

OuR-(x), U\(|x}uR-(x))), the other one labelled with C-x=(I, Ou{x), U\|x}), for 
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some xeU. Since we look for a preferred extension containing a, the root of the tree is 
labelled with the partition ({a}, R*(a), A \ ({a} u R*(a))). 

The set Op = R (I) \ R^(I) denotes the set of arguments which are predecessors but 
not successors of I, that is, these arguments attack I and I cannot defend itself against 
them. When a partition associated to a node n is such that Op is empty, then I is an 
admissible set: the computation has proved that a is in at least one preferred extension. 
When the partition is such that there is some y e Op which has no more undecided 
predecessor, then no preferred extension can be found on that branch, because an 
argument in I will never be defended by (I u U): the exploration of that branch can be 
stopped. Thus the algorithm can be sketched in a functional programming style as 
follows: 

function PrefEnum(R,C) 

parameters: a binary relation R, and an R-candidate C=(I,0,U) 
result: T if 1 is contained in at least one preferred extension, _L otherwise 

if Op=0thenT 

elseif there exists xsOp such that R‘(x)cO then _L 
elseif there exists yeR‘(Op) such that yRy then PrefEnum(R,C-y) 
else for some xsOp such that R‘(x)(ZO do 
select some yeR'(x) \ O 
return(PrefEnum(R,C+y) or PrefEnum(R,C-y)) 

It is often possible to choose a “good” y such that only one branch has to be explored, 
but this is not relevant here. We will now show that when PrefEnum is called on ({a}, 
R*(a), A \ ({a} u R*(a))), if the computation returns T then it is possible to extract a 
0 1-proof for a from the successful branch. 

Proposition 6. Let (A,R) be an argumentation framework, and let a be some element 
of the union of the preferred extensions of (A,R). Let C(0), C(l),...,C(n) be the se- 
quence of R-candidates that label the nodes from the root to the success leaf of the tree 
explored by PrefEnum when called on ({a}, R'^Ca), A \ ({a} u R*(a))). For 0*j*n, C(j) 
is of the form CG) = (ia),Oa),Ua)); let OpG) = R'da)) \ RdlG))- Let be the 

longest subsequence of (l,...,n) such that for every l*i*m, C(jj) is of the form C(j;-1) H- 

y(i), where y(i) e R‘(x(i)) \ 0(j(-l) for some x(i) e Op(j|-l) with R‘(x(i)) cZOljj-l). 
Let d=(c(0), . . ., c(2m)) be the dialogue defined by: 

• c(0) = [PRO,a] 

• c(2i-l) = [OPP,x(i)] and c(2i) = [PRO,y(i)] for l*i*m. 

Then d is a 0 1-proof for a. 

Example. Let us illustrate the above algorithm on the argumentation framework AF2 
of figure 3. We look for a preferred extension containing a, so the root of the tree is 
labelled with the partition C(0)=({a},{b,c,d},{e,f,i,j)). We have Op(0)={b,c,d}. We 
select ieR'(b) and we build two branches. One child of the root is labelled with 
C(0)-i-i, the other one is labelled with C(0)-i. We look at the one labelled with 
C(0)-i-i = C(l) = ({a,i}, {b,c,d}, {e,f,j}). In this case, Op(l)={c}. We select f e R(c) 
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and we build two branches. One child of the node associated with the partition C(l) is 
labelled with C(l)+f = C(2) = ({a,i,f}, {b,c,d), {ej})- Op(2)=0, this is a leaf of suc- 
cess. The 01 -proof for a shown on figure 3 can be easily extracted from this success- 
ful branch, given that x(l)=b, y(l)=i, x(2)=c, y(2)=f. 



5.2 Comparison with [VPOO] 

[VPOO] have proposed an argument game for the credulous decision problem, related 
to the preferred semantics. The precise rules of that game are the following ones: 

The argument advanced in a move (except the first one), attacks one of the previous 
arguments of the other player. A dispute is a succession of moves satisfying the fol- 
lowing conditions: PRO plays first. A line of dispute is a succession of moves such 
that each player plays in turn and attacks the previous argument proposed by the other 
player. In a same line of dispute, OPP cannot repeat himself, but OPP can repeat an 
argument already advanced by PRO. Each player can backtrack; backtracking consists 
in opening a new line of dispute. OPP can repeat himself in different lines of dispute. 
PRO can repeat himself but he cannot repeat an argument already proposed by OPP. 

OPP wins a dispute if PRO cannot respond to the last move of OPP, or if the last 
move of OPP is an argument already proposed by PRO and on which there were no 
backtrack. PRO wins a dispute if OPP does not win. 

A given argument is contained in a preferred extension if and only if each dispute 
beginning with that argument is won by PRO. 

Example, (from [VPOO]) Let AF3 as indicated on figure 4. There are two disputes 
starting with f. Let us build one of them, as done in [VPOO]. PRO plays f. OPP ad- 
vances n. PRO can respond with i or j. Assume PRO responds with i. Then OPP at- 
tacks i with]. To defend i against], PRO repeats i. According to the [VPOO] rules, the 
dispute cannot be continued, it is won by PRO. Note that it reduces to a single line of 
dispute. The second dispute starting with f is obtained by exchanging i and j. So, it is 
also won by PRO. 

Similarly, there are two 01 -proofs for f, which are also 02-proofs. Each one is shorter 
than the corresponding dispute. The three first moves are the same. The 01 -proof 
stops at the third move, since at that stage, R'((f,i}) \ RX{f,i})=0. 

Example, (from [VPOO]) Let AF4 as indicated on figure 5. Let us build a dispute lost 
by PRO, starting with m, as done in [VPOO]. PRO plays m. OPP responds with 1. PRO 
can advance p or k. Assume PRO advances p. Then OPP responds with h. PRO cannot 
respond to h since h has no predecessor. This line of dispute cannot be continued, but 
we can open another one where PRO proposes k in response to 1. But then OPP attacks 
k with m and PRO cannot respond, since PRO is not allowed to repeat 1, already pro- 
posed by OPP. Consequently, m is not a credulous consequence of AF4. 

We reach this conclusion faster with our 02-proof theory. Actually, to respond to 1 
advanced by OPP, PRO can only play p: k is attacked by m, an argument previously 
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advanced by PRO. OPP attacks p with h and the dialogue cannot be continued. This 
dialogue is lost by PRO. It corresponds to the first line of dispute. 





^ f ^ 
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n 




01 -proof for f 


AF3 
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PRO: 


i 








L J 


OPP: 


j 










dispute for f 




PRO: 


i 





Fig. 4. The argumentation framework AF3, a dispute and a <t>l-proof for f 




Fig. 5. The argumentation framework AF4, the 02-dialogue and the dispute about m 



Our proofs are shorter than [VP00]’s proofs for a simple reason: we eliminate the 
arguments which are successors of PRO(d) from the possible moves of PRO and OPP. 
It is clear that such arguments are rejected from any admissible set containing PRO(d), 
so it is useless for OPP to advance these arguments (see figure 4). Moreover, such 
arguments are forbidden for PRO if we want PRO(d) to be contained in a conflict-free 
set (see figure 5). 



6 Conclusion 

The work reported in this paper concerns the credulous decision problem related to 
the preferred semantics. We have proposed two proof theories for that problem. These 
proof theories are presented in a dialectical framework, as done in [JV99]. Both proof 
theories improve on the work by [VPOO] in the sense that our proofs for a given argu- 
ment are usually shorter than the argument games proposed by [VPOO]. 

Moreover, we have shown that our proofs can be computed by the credulous query 
answering algorithm of [DM01]. These results suggest to follow the same approach 
for the skeptical decision problem since [DM01] also present an efficient skeptical 
query answering algorithm. We project to design a dialectical proof theory for the 
skeptical preferred semantics. 
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Abstract. This paper presents the QRK system for reasoning under un- 
certainty, which combines the building of logical arguments for formulae 
with infinitesimal probabilities of the kind handled by the kappa calcu- 
lus. Each constituent of an argument has an associated K-value which 
captures belief in that component, and these values are combined when 
arguments are constructed from the components. The paper is an exten- 
sion of our previous work on systems of argumentation which reason with 
qualitative probabilities, providing a finer-grained approach to handling 
uncertainty. 



1 Introduction 

In the last few years there have been a number of attempts to build systems for 
reasoning under uncertainty that are of a qualitative nature — that is they use 
qualitative rather than numerical values, dealing with concepts such as increases 
in belief and the relative magnitude of values. Three main classes of system can 
be distinguished — systems of abstraction, infinitesimal systems, and systems of 
argumentation. In systems of abstraction, the focus is mainly on modelling how 
the probability of hypotheses changes when evidence is obtained. Such systems 
provide a qualitative abstraction of probabilistic networks, known as qualitative 
probabilistic networks (QPNs), which is sufficient for planning [14], explanation 
[3] and prediction [11] tasks. Infinitesimal systems deal with beliefs that are 
very nearly 1 or 0, providing formalisms that handle order of magnitude prob- 
abilities. Such systems may be used for diagnosis [2] and have been extended 
with infinitesimal utilities to give complete decision theories [12,15]. Systems of 
argumentation are based on the idea of constructing logical arguments for and 
against formulae. Such systems have been applied to problems such as diagnosis, 
protocol management and risk assessment [5], as well as handling inconsistent 
information [1], and providing a framework for default reasoning [4,9]. 

In this paper we provide a hybridisation of infinitesimal systems and sys- 
tems of argumentation, by defining a system of argumentation which uses order 
of magnitude probabilities, in particular the values manipulated by the kappa 
calculus. 
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2 Kappa Calculus 

The kappa calculus is a formalism that makes it possible to handle order of 
magnitude probabilities, representing a state of belief by means of a ranking k 
that maps propositions into class of ordinals. This mapping is such that: 

n{true) = 0 (1) 

k((/) V ^/>) = min(K(^), k('0)) (2) 

According to the kappa calculus, a proposition a is believed to degree s, if 

K{-ia) = s; is disbelieved to degree s if K{a) = s; and is uncommitted if «:(«) = 

K{-'a) = 0. When accommodating disbelieved evidence, the choice about which 
beliefs have to be retracted depends on their strength. 

The K ranking also has the following properties, analogous to the familiar 
properties for probability distributions [7]: 



k{4>) = min k{ui) 


(3) 


k{iI}\4>) = K,{ip A 4>) — k{4>) 


(4) 



Typically K-values are assumed to be obtained from probabilities, by a form of 
order of magnitude abstraction in which all probabilities within a given order 
of magnitude are mapped to the same k- value. Following Spohn [13], one can 
relate a probability p to k- value k by: 

gK 

which of course is equivalent to: 

<p<e^. 

One procedure to map probabilities into n values is [2]: 

1. If p = 0 then print oo; 

2. A: ^ 0; 

3. p ^ 

4. If p > 1, then print k otherwise fc /c + 1; 

5. Goto 3; 

and an alternative mapping has been suggested by Giang and Shenoy [6]. For 
this work we assume that either such a mapping has already been applied, or 
that the K-values have been elicited directly — we just assume the existence of a 
set of K- values for the propositions we are interested in. 

3 The QRK System 

Having introduced the kappa calculus, we can start to introduce the system of 
argumentation which will use K-values. 
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3.1 Basic Concepts 

We start with a set of atomic propositions C. We also have a set of connectives 
{- 1 , — >■}, and the following set of rules for building the well-formed formulas {wffs) 
of the language. 

1. If I G C then I is a well- formed simple formula (swjf). 

2. If I is an swff, then -•I is an swff. 

3. If I and m are swffs, then / — >■ m is a well-formed implicational formula 
(iwff) 

We denote the set of all swjfs which can be derived using £ by 5c, while Ic 
denotes the corresponding set of iwjfs. The set of wffs that can be defined using 
£ is >V = 5c U Ic may then be used to build up a database A where every 
item d G A is a, triple (i : / : s) in which i is a token which uniquely identifies 
the database item (for convenience we will use the letter ‘£ as an anonymous 
identifier), I is a wff, and s gives information about the degree of belief associated 
with 1. In particular we distinguish two cases: 

— Z is an swff: In this case s is the pair expressing the degree of belief associated 
with I and the degree of disbelief associated with -•I, that is k{-'1)); 

— Z is an iwff: In this case — >■ does not represent material implication but that 
the antecedent of the wff has a probabilistic influence on the consequent. 
Therefore, the sign s indicates the belief in the consequent given the an- 
tecedent . Thus each iwff has associated with it a sign s which is the ordered 
set of four conditional K-values: («;(m|Z), k(to|-'Z), «:(-'m|Z), «:(-'m|-'Z)). 

Note that there is a notion of direction, similar to that in the directed arcs of 
probabilistic networks, associated with iwffs. 



3.2 The Proof Theory 

In the previous section we introduced a language for describing belief influences 
between formulae. For this to be useful we need to give a mechanism for taking 
sentences in that language and using them to derive new sentences. In particular 
we need to be able to take formulae with associated K-values and use these 
to derive new formulae and their associated K-values. This is done using the 
consequence relation \~qrk which is defined in Figure 1 

The definition is in the form of Gentzen-style proof rules where the an- 
tecedents are written above the line and the consequent is written below. The 
consequence relation operates on a database of the kind of triples introduced in 
Section 3.1 and derives arguments about formulae from them. The concept of 
an argument is formally defined as follows: 

Definition 1. An argument for a well-formed formula p from a database A is 
a triple (p, G, Sg) such that A \~qrk {p, G, Sg) 
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Ax. 



^ l-Qflx {St, {i}, Sg) 



{i:St: Sg) € A 



g ^^QRK i^St,G, Sg) 

A \-QRK {St, G, neg(S’ff)) 



T A\-qrk {St,G,Sg) 

A \-QRK {-'St, G, neg{Sg)) 

p A \-QRK {St, G, Sg) A \-QRK {St — >■ St', G' , Sg') 
A l-Qi?K {St' ,GVJG' ,\vnp.~ {Sg,Sg')) 

A hQRK {Sf, G, Sg) A hQRK {St ^ St', G', Sg') 
A hQflx {St, G U G', imp... {Sg, Sg')) 



Fig. 1. The consequence relation \~qrk 



The sign Sg of the argument denotes something about the degree of belief as- 
sociated with the formula p, while the grounds G identify the elements of the 
database used in the derivation of p. 

To see how the idea of an argument fits in with the proof rules in Figure 1, 
let us consider the rules Ax and — >-E. The first builds an argument from a triple 
{i : St : Sg), which has a sign Sg and a set of grounds {z}, where the grounds 
identify which elements from the database are used in the derivation. This rule 
is a kind of bootstrap mechanism to allow the elements of the database to be 
turned into arguments to which other rules can be applied. The second, — >-E, 
can be thought of as analogous to modus ponens. From an argument for St and 
an argument for St — >■ St' it is possible to build an argument for St' once the 
necessary book-keeping with grounds and signs has been carried out. 

3.3 Combination Functions 

In order to apply the proof rules of Figure 1 to build arguments, it is necessary 
to supply the functions used to combine signs. These are provided in this section. 

The rules for handling negation are applicable only to swffs and permit 
negation to be either introduced or eliminated by altering the sign, for example 
allowing (z : a : Sg) to be rewritten as {i : ~ia : Sg'). This leads to the definition 
of neg: 

Definition 2. The function neg: Sg G [0, oo[x [0, Sg' £ [0, oo[x [0, oo[ is 
defined as follows: 

If Sg={s,s') 

Then Sg' = {s' , s) 

To deal with implication we need two elimination functions imp.— and imp..., 
where the former establishes the sign of formulae generated by the rule of infer- 
ence — >-E, while the latter is used to establish the sign of formulae generated by 
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— >-R. We start by discussing imp Let us suppose we have an implicational 

formula {i : a ^ b : Sg) where Sg is the quadruple of k- values: 

(«:(6|a), K{b\-<a), K{~<b\a), K{-<b\-<a)) 



if we have the swff 

{j :a: (^(a), ^(-.a))) 

then by applying the rule imp.~ we can obtain b and the pair («:(6), «:(-'&)). 
In order to do so we have to combine (K{b\a) , K{b\~<a) , K{~<b\a) , K{-<b\-<a)) with 
{K{a),K{^a)). 

Definitions. The function imp.™ ; Sg G [0, oo[x [0, oo[xS'g' G [0,oo[‘* i-G 
Sg" G [0, oo[x [0, oo[ is defined as follows: 

If Sg={s,s') 

Sg' = 

Then Sg" = {w,w') 



where: 



w = min(r + s, r' + s') 
w' = min(t + s,t' + s') 

These two equalities are obtained by turning the probabilities in Jeffrey’s rule 
[8] into K- values. 

The function imp... is obtained by computing Pr(a) by manipulating Jeffrey’s 
rule for probabilities with Bayes’ rule and then by mapping this expression into 
kappa calculus. 

Definition 4. The function imp....- Sg G [0, oo[x [0, oo[xS'g' G [0,oo[^ i-G Sg" G 
[0, oo[x [0, oo[ is defined as follows: 

If Sg={s,s') 

Sg' = {r,r',t,t') 

Then Sg" = {w,w') 



where: 

and 



w = min{s — min(r, r' — 1), — 1 — min(r, r' — 1)} 

r min{s — min(r', r — 1), r — 1 — min(r', r — 1)} if w yf 0 
1 oo otherwise 



3.4 Soundness and Completeness 

In order to prove soundness and completeness we first need to capture the kind 
of relationships that may hold between two formulae: 
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Definition 5. A well-formed formula p is said to be a cause of a well-formed 
formula q if and only if it is possible to identify an ordered set of iwffs {p — >■ 
Cl, Cl -!> C2, . . . ,c„ q}. 

That is, p is a cause of q if it is possible to build up a trail of (causally directed) 
implications linking p to q. 

Definition 6. A well-formed formula p is said to be an effect of a well-formed 
formula q if and only if q is a cause of p. 

Thus p is an effect of q if it is possible to build up a trail of (causally directed) 
implications linking q to p. Soundness will relate to ensuring that given infor- 
mation about the K-value of a particular formula we can compute the correct 
K-value of its causes and effects, and completeness will relate to ensuring that 
we can compute the K-values of all such causes and effects. 

Before proceeding to prove soundness and completeness, we need to take into 
account two problems which can arise when doing evidential reasoning, that is 
reasoning both in the direction of the implications and in the opposite direction. 
We are enabled to use evidential reasoning by having included the rule — >-R 
in the consequence relation. The first problem arises because when implications 
are reversed, then the proof procedure can loop and therefore build an infinite 
number of arguments. This is possible even if we have a single iwff since there is 
nothing to stop the proof procedure alternately applying — >-E and — >-R forever, 
building a new argument from each application. However, the problem can be 
easily solved by introducing the concept of a minimal argument as in [10]: 

Definition 7. A minimal argument is an argument in which no iwff appears 
more than once. 

We then reject non-minimal arguments, as we shall see below. 

The second problem to deal with is caused by the need to handle conditional 
independence in the proper way. If proof rules are applied blindly then it is pos- 
sible to build arguments which do not respect conditional independence. Such 
arguments would not be valid according to the kappa calculus, so they need to 
be eliminated. To identify arguments that are invalid because of conditional in- 
dependence we introduce the notion of d-separation from probabilistic networks, 
suitably modified for K-values. However, before proceeding any further we first 
need to introduce some additional definitions: 

Definition 8. A source of an argument (p, G, Sg) is an swff from G 

That is a source of an argument is one of the simple formula which grounds it, 
and therefore is the head of a chain of implications. In the same way we define 
the destination of an argument as: 

Definition 9. The destination of an argument (p,G,Sg) is p. 

We then define d-separation as follows: 

Definition 10. Two formulae p and q are d-separated if for all arguments 
which have p as their source and q as their destination, there is another for- 
mula r such that either: 
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1. p is a cause of r, r is a cause of q, and either r or -•r is known to he true; 
or 

2. p is an effect of r, q is an effect of r, and either r or -r is known to be true; 
or 

3. p and q are both causes of r and there is no argument (r, G, Sg) such that 
all the swfTs in G are effects of r. 

We are now in a position to define the subset of all arguments which do not 
suffer from the two problems we discussed above: 

Definition 11. An argument A = (p, G, Sg) is invalid if any of the sources of 
A are d-separated from p. 

and consequently 

Definition 12. An argument A = (p, G, Sg) is valid if it is not invalid. 

The set of minimal valid arguments are then the problem-free subset of all pos- 
sible arguments which can be built from some database of triples. 

Now, because arguments in QRK typically only indicate a degree of belief in 
a formula (rather than indicating that it is true or false), in general there will 
be several minimal valid arguments concerning it with differing degrees of belief. 
To combine these, we define a flattening function, and we do this in a way such 
that only minimal and valid arguments are taken into account. This function, 
Flat(-) is a mapping from a set of arguments for a formula St built from a 
particular database A to the pair of that proposition and some overall measure 
of validity. Thus we have: 

= {{St,G,,Sgf)\A^QRK {St,G,,Sgf)} 



and then 

Flat : {A \A is minimal and valid} i— {St, v) 

where u is a single pair of K-values, {n{St), K{~<St). The value v is then the result 
of a suitable combination of all the signs of all the arguments for St: 

r;= MIN,({^p,|(^t,G„5p,) G A'^J) 

where each Sgi is a pair {K{St) , K{-iSt)) , A'g^ is the set of all minimal, valid 
arguments in A^^, and the function MINj is defined as follows: 

MINi((«:(aj), K(-'ai))) = (min«;(ai),maxK(-iai)) 

i i 

This definition of the flattening function is motivated by the fact that if we have 
different arguments, we want to consider the most plausible one — that is we tend 
to choose the one associated with the most normal world, therefore the one for 
which holds that a is highly believed while is highly disbelieved. 

Once the flattening function is established we can use it to provide a proce- 
dure to determine the overall procedure for determining the measure of belief in 
a formula q in which we are interested. This procedure is: 
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1. Add a triple {i : p : s) for every formula p whose degree of belief is known; 

2. Build A^, the set of all arguments for q using the rules given in Figure 1; 

3. Flatten this set to give {q, (n{q), K{-'q))); 

Given the previous definitions it is possible to show that, given information 
about the degree of belief in (that is the the K-value associated with) some 
formula p, the rules of the consequence relation \~qrk may be used to soundly 
and completely compute arguments concerning the change in the degree of belief 
associated with the causes and effects of p. 

Theorem 13. The construction and flattening of arguments in QRK using the 
rules of\-QRK is sound with respect to the kappa calculus 

Proof. The proof is by showing the soundness of the combination functions. For 
imp.™ : Let us consider the iwff (i : a ^ b : Sg), where Sg is quadruple of 
K-values: 

(«;(6|a), «;(6|-'a), K(-'6|a), K{-<b\-<a)) 

From the sign of a — >■ & and the sign of a, which is (^( 0 ), K{-<a)) we want to be 
able to calculate the sign of the formula b, which is (k(6), k(-' 6)). Since k- values 
are equivalent to probabilities, we manipulate Jeffrey’s rule [8] and then map 
into K- values in order to obtain k( 6). Jeffrey’s rule for probabilities is: 

Pr(&) = Pr(&|a) Pr(a) + Pr(6|-ia) Pr(-io) 

which can be mapped into the kappa calculus expression: 

K{b) = min{K(6|a) + K{a), K{b\~<a) + «:(-'a)} 

In the same way we can use Jeffrey’s rule to calculate the probability Pr(->6) 
from which we obtain the k- value formulation for K{-'b): 

K{-<b) = min{«:(-'&|a) + K{a), K{-<b\->a) + K(-ia)} 

since this is exactly the combination function used, imp.™ is sound, imp... is 
slightly less straightforward and requires a few manipulations. This function 
calculates the sign of a formula a starting from the iwff (i : a ^ b : s) and the 
swff {j : b : (kIJ?) , K{-<b))) . We start by proving the formula for K{a). Jeffrey’s 
rule is 

Pr(&) = Pr(&|a) Pr(a) + Pr(6|-ia) Pr(-io) 
which can be rewritten as: 

Pr(6) = Pr(6|a) Pr(a) + Pr(6|-'a)(l — Pr(a)) 



which is 

Pr(6) = Pr(6|a) Pr(a) — Pr(&|-ia) + Pr(6|-ia) Pr(a) 
therefore Pr(a) is obtained as: 

Pr(a)(Pr(5|a) — Pr(6|-ia)) = Pr(5) — Pr(&|-ia) 



and so 
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^ Pr(b) -Pr(6ha) 

^ ’ Pr(6|a) -Pr(6ha) 

which mapped into the kappa calculus becomes: 

k(o) = min{«:(6) — min[«:(&|a), K{b\-<a) — 1], K{b\-<a) — 1 — min[At(6|a), K{b\~<a) — 1]} 

where the absolute value is added to make sure K{a) > 0. In the same way the 
expression for K(->a) can be computed, thus obtaining: 

K{-<a) = min{«:(6) — min[«;(6|-'a), k(6|o) — 1], «;(6|a) — 1 — min[K(6|-'a), K{b\a) — 1]} 

neg is quite straightforward, following directly from the definition. The soundness 
of the flattening function can be proved by demonstrating that if we have two 
different arguments for a formula c, one from a to & and then to c and the second 
from d to 6 to c then the degree of belief which results from flattening the two 
arguments is the same that would be computed were there only one argument 
from a and d in combination to b, and then from b to c, where the combination 
is disjunctive, making something like the usual Noisy-Or assumption. In the first 
case we have the following chain of formulae: 

1 . Q — y b — y c 

2. d — y b — y c 

and the two swffs {i : a : («:(a), K{-ia))) and (j'-d: («:(d), K{-<d))) Let us denote 
with Ki{c) the degree of belief associated with the first argument and with K 2 {c) 
the degree of belief associated with the second one. By applying — >-E twice we 
can compute: 

Ki(c) = min {k(c| 6) + min[«:(6|a) + k(o), K{b\~<a) + «:(-'a)], 

k(c|-'6) + min[«:(-'6|a) + «:(a), n{-<b\-<a) + K(-ia)]} 

where we have used the following substitutions: 

k(&) = min{«;(6|a) + «:(a), «:(6|-'a) + K(-'a)} 

K{-<b) = min{K(-'6|a) + k(o), K{-<b\-<a) + K(-ia)} 

Analogously we can compute ^ 2 ( 0 ) as: 

Av 2 (c) = min{«:(c|6) + min[«:(6|d) + «:(d), K{b\~<d) + /t(-id)], 
k(c|-'&) + min[K(-i&|(i) + K{d), K{-'b\-'d) + ^(-'d)]}. 

In order to compute the degree of belief associated with c we need to flatten the 
two arguments by using the flattening function defined above. If we flatten them 
the resulting degree of belief will be 

(min(Ki(c), K2(c)),max(Ki(-ic), K2 (-'c))) 

that is for c the overall «:-value is: 

min{min[K(c|6) + a, K{c\-<b) + / 3 ], min[«:(c|6) + 7, K{c\-<b) + d]} 



where: 
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a = min[«;(&|a) + k(u), K{b\~<a) + «:(-'a)] 

(3 = min[K(-'6|a) + K,{a), /t(-'6|-'a) + K(-'a)] 
7 = min[«;(&|fi) + K,{d), K{b\-<d) + K{-id)] 

6 = min[«;(-'6|(i) + K{d), K{-<b\-<d) + K{-<d)] 



This can be rewritten as: 

min{«;(c|&) + min(a, 7), K{c\-<b) + min(/3, 5)} 

This result is the same as the degree of belief which would be computed were 
the degree of belief in b first computed from a disjunctive dependence on a and 
d and the result then propagated to c. Something very similar can be carried 
out for the K-value of ->c, but with max in place of the outer min, thus proving 
that QRK flattens arguments soundly. This concludes the proof. □ 

Having proved the soundness we can move on to prove completeness, but before 
giving such a proof we need to define what we mean by completeness. 

Definition 14. The construction and flattening of arguments is said to be com- 
plete with respect to some formula p if it is possible to use that system to compute 
all the K-values of all the effects of p, all the causes of p and all the causes and 
effects of all the causes and effects of p. 

With this definition it is now possible to state and prove the following theorem: 

Theorem 15. The construction and flattening of arguments in QRK is com- 
plete with respect to any formula. 

Proof. The proof follows from the definition of h QRK, that is the k- value of all 
the causes and effects of any well-formed formula p which may be stated in QRK 
can be made by the application of the appropriate proof rules. In proving this 
we need to distinguish proof of completeness for causes from those for effects. 
We start from the latter. Let us consider the addition of the triple {i : p : 
{k{p), k{-<p))) where p contains no negation symbols, to a database that contains 
only formulae without negation symbols. We can have two types of of effect of p: 
The first are consequents of implications in which p forms the antecedent while 
the second are those effects that are related to p by two or more implications. 
In the first case the K-values associated with the formula can be computed by 
applying the proof rule — >-E. In the latter case the degree of belief associated 
with the formula may be obtained by recursively applying the — >-E rule. 

Analogously, we can recognise two types of causes of p, those which are 
antecedents of implications where p is the consequent and those which are causes 
that are related to p by two or more implications. In the first case the K-value 
associated with the formula is computed by — >-R while in the second case the 
K-value may be obtained by recursively applying — >-R. 

Applying both — >-E and — >-R recursively is sufficient to ensure completeness 
for situations without negation, and the appropriate use of the rules -■-I and -i-E 
make it possible to deal with situations in which the negation symbol appears. 
□ 
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4 Example 

Let us suppose we have the following information about the health of a friend. 
The event that our friend has a cold (C) increases the belief that she is sneezing 
(S). But also the event R that she has an allergic reaction increases the belief 
that she is sneezing. The event T, that our friend has taken some antihistamine, 
however, reduces the belief that she is sneezing, while the event that she has an 
allergic reaction R increases the belief that she has taken an antihistamine. The 
event A, that our friend is allergic to cats increases the belief that she might 
have an allergic reaction. This information may be represented as: 

(rl : C ^ S' : (k(S|C) = 0, k(-S|C) = 1, k{S\^C) = 2, k{^S\^C) = 1)) A 
{r2:R^S: (k(S|R) = 0, k(-S|R) = 1, k{S\^R) = 2, k{^S\^R) = 1)) 

(r3 : T ^ S : (k(S|T) = 2, k(^S|T) = 1, ^(Shr) = 1, ^(-ShT) = 1)) 

(r4 : i? ^ T : \k{T\R) = 1, k{--T\R) = 1, k{T\^R) = 4, n{--T\^R) = 1)) 

If we believe that our friend is having an allergic reaction, then we can add the 
following fact to A: 



{fl-.R-.{K{R) = l,n{^R) = 3)). 

Adding this fact permits us to build two minimal, valid arguments concerning 
our friend taking antihistamine: 

A^qrk (T, {/l,r4}, (Kl(T) = 3,Ad(-T)=3))), 

by applying — >-E once while if we first apply — >-E and then — >-R we obtain: 

A\-qrk {T, {/I,r2,r3}, («:2(T) = 0, k 2(-T) = oo)) 

By flattening these combine to give the pair (T, {k{T) = 0, n{~^T) = oo)) to 
indicate that the event that our friend is not taking antihistamine warrants a 
much greater degree of disbelief than the event that she is taking antihistamine. 

5 Conclusions 

In this paper we have presented QRK, a system of argumentation in which 
uncertainty is handled using infinitesimal probability values, in particular values 
from the kappa calculus. The use of K-values means that the system can be 
used when probabilistic knowledge of a domain is incomplete, and this makes 
it applicable to a wider range of situations with respect to systems based on 
complete probabilistic information. The system associates a k- value with every 
logical formula, and combines these values as arguments are built in a way which 
is sound with respect to the kappa calculus. Thus the arguments which can be 
constructed in QRK come complete with an order of magnitude estimate of 
the probability of the formula supported by the argument, and the system thus 
supports qualitative probabilistic reasoning. 
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Abstract. Importance measures are a well-known concept developed 
in reliability theory. Here, we apply this concept to assumption-based 
reasoning, a field which in fact is quite close to reliability theory. Based 
on quasi-supporting and supporting arguments, we develop two concepts 
of importance measures and show how they are related to the ones from 
reliability theory. 



1 Introduction 

Consider a typical situation in model-based diagnostics as described in general 
in (Kohlas et al, 1998, Anrig, 2000): given some knowledge about, for example, 
an electronic device in a propositional language over propositions and assump- 
tions, sometimes together with some observations on the in- and outputs of the 
system, one can compute symbolic and numerical diagnoses and conflicts as well 
as support, doubt, etc. for the interesting hypotheses. In model-based diagnos- 
tics, we are especially interested in the symbolic and numerical values for the 
(minimal) conflicts and the (minimal) diagnoses. The question addressed in this 
paper is: which one(s) of the assumptions does influence the conflicts (or diag- 
noses) at most? Which assumptions are more relevant, which ones less relevant, 
and which are irrelevant? We will use the term importance instead of relevance 
inspired by reliability theory (Beichelt, 1993), see also Section 4. 

The approach considered in this paper is based on computing symbolic ar- 
guments and only afterwards their respective probabilities. In Section 4 we con- 
sider some of the connections between computing conflicts and diagnoses and 
the computation of structure functions in reliability theory. 

In the sequel, we will use the formalism of assumption-based reasoning as 
introduced in (Kohlas et al., 1998, 2000); we refer also to these articles for a 
comparison with the work of Reiter (1987) as well as De Kleer & Williams, 
Provan, Laskey & Lehner, and others. 

The following example will be used throughout this paper to illustrate the 
different concepts. 

* Research supported by grant No. 2000-061454.00 of the Swiss National Foundation 
for Research. 
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Example 1. A network with binary gates. 

Consider a network which con- 
sists of “or” gates ori, or 2 , or 3 , a = 
and “exclusive-or” gates xor\, 
xor 2 connected as in Fig. ??. 

The values of the in- and out- c = 

puts of the system are assumed ^ _ 

to be observed according to 
Fig. ??. e = 

The behavior of each com- 
ponent is expressed by a mate- 
rial implication, so for example the or-gate ori is specified by 

ori — >■ {out O (mi V m 2 )) 

i.e. if the or-gate is working correctly (ori true) then its output is on only if at 
least one of the inputs is so. Nothing is known about the behavior of a faulty 
component. An analogue modeling is given for the other gates. Assume that 
the probability of a component working correctly is 0.95 for xor-gates and 0.97 
for or-gates. Apparently, the observations in Fig. ?? imply that the system is 
not working correctly, because the predicted output (1) at point / is in con- 
flict with the observed one (0). The minimal conflicts and diagnoses can be 
computed using standard techniques (for example using a solver for ABEL, see 
Haenni et al, 2000) as sets 

gs(T) = {xori A ori A or 2 , xori A xor 2 A ori A or 3 }, 

sp{T) = {->ori, -<xori, ~<or 2 A ~'or 3 , ~<xor 2 A -■or 2 } 

of conjunctions over assumptions in {ori, or 2 , or 3 , xori, xor 2 } and, for example, 
p{qs{J-)) = 0.919. Now the question is: which assumption is important? Or, in 
more detail, which assumption does influence the probabilities of the diagnoses 
and/or conflicts at most? Which one has no importance? In this example, one 
can deduce from the graphical representation of the situation (Fig. ??), that the 
or-gate ori and the xor-gate xori are more important than the other components 
but there is no component which has no influence at all. © 

In the remainder of this section, we will introduce the basic definitions for 
the concept of assumption-based reasoning; for further introductory literature 
see (Kohlas et al, 2000). 

A tuple {E,P,A,n) is called a probabilistic argumentation system 
(Kohlas et al., 2000) where P and A = { 01 , 02 ,... ,o„} are sets of of propo- 
sitional variables with A C\ P = ^, the elements of P are called propositions, 
the elements of A assumptions. Let N C AU P, then Cn denotes the proposi- 
tional language built over the atoms N U |T,T} using the connectors A, V, 

— >■, and -fA, as well as parentheses. The element T represent inconsistency and 
is therefore never true, and T represents tautology and is always true. Logical 
equivalence of formulas from this language is denoted by “=”. Cn C £ji[ is the 
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set of all the conjunctions in £at. The consequence operator is defined in 
the usual way using sets of models. 

The set of formulas S C Caup is called the knowledge base and models 
the information available, for example the behaviour of a complex system. We 
assume that this knowledge base is not contradictory, i.e. /\S ^ ±. Besides, 
there are formulas from Cp which are called observations and will be added to 
the knowledge base in some situations. 

il is a set of probabilities {pai , • ■ • > Po„ } such that p{a) = Pa for every a G A. 
Here, we assume that the PaS are independent. These probabilities imply a 
probability p : La — >■ [0, 1] on the propositional language La by 

P(f)= 

ClA^2^--^5nl=/ ^i—ai 

for / G La- Algorithms for efficiently computing the probability p{f) are dis- 
cussed in (Kohlas & Monney, 1995, Bertschy & Monney, 1996). 

2 Importance Based on Differences of Probabilities of 
Conflicts 

The first importance measure depends on the basic notions of conflict and quasi- 
support defined in probabilistic argumentation systems. For / G Laup the quasi- 
support qs{f, E) and the degree of quasi-support dqs{f, E) relative to the knowl- 
edge base E are defined as 

qs(f, E) = {a G Ca ■■ a, E \= /}, dqs{f, E) = p{qs{f, A)). 

Often, we omit E and consider the sets qs{f) as disjunctions of conjunctions, 
i.e. as disjunctive normal forms (DNF). The context will always indicate which 

representation is used. 

The probability of a quasi-support for a G A can be computed as 

dqs{a) = p{a V qs{±)) = p{a) + p{~'a A qs{±)), (1) 

where the first part depends only on a, the second only on -lo. Consider now the 
formulas 

gSa(-L) := (gs(T))[T /a] = {c G 9s(T) : a in c is instantiated by T} 
gs-,a(T) := (gs(T))[T/a] = {c G 9s(T) : a in c is instantiated by T} 

The formulas </Sa(-L) and gs-,o(T) are independent of a and -■a, because the 
variable a does not occur any more. This allows to write the degree of quasi 
support in the following form: 

dqs{a) = p(a) + p{^a A gs^a(-L)) = Pa + (1 - Pa)p{qs^a{-L)). 

Consider now the quasi-support of a as a function of the probability Pa, i.e. 

fa{Pa) ■■= Pa + (1 - Pa)p{qS^a{-L)) . 
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This function is linearly dependent of the probability Pa and /a(0) = p(gs-,a(-L)) 
as well as /a(l) = 1 • 

Using the above ideas, we can say that an assumption a S A is important, if 
the difference |p(gSo(-L)) — t(9S-.o(-L))| is large. 

Definition 1. The D-importance of a G A is I^{a) = |p(gSa(-L)) — t(9S-.o(-L))|- 

This definition has a nice property, namely that I^{a) = so that we can 

talk of the D-importance of the assumption a. We will reconsider this idea in 
Section 4, where we will show that this approach is related to reliability theory 
and we will argue why it is not our preferred approach. But first, an example: 

Example 2. (Continuation of Example 1) First, the values p(gSa,(T)) are com- 
puted for X G {xori,-ixori, . . . , or^j-iors}. This computation is addressed in 
Section 5. The results are: 



X 


XOTi 


XOV 2 


OTi 


or2 


ors 


p{dSx{E)) 


0.9463 


0.9192 


0.9662 


0.9215 


0.9201 


p{qs^x{E)) 


0 


0.8754 


0 


0.8492 


0.8754 



This allows then to compute the following D-importances: 



Component 


XOTi 


XOT 2 


OTi 


OP 2 


ora 




0.9463 


0.0438 


0.9662 


0.0723 


0.0447 



So, the components xori and ori have the highest D-importance, which is exactly 
what we have noted in Example 1 using a graphical argumentation. © 



3 Importance Based on Distances of Degrees of Support 



In the framework of probabilistic argumentation systems, the concept of quasi- 
support is considered as a computationally interesting aid for determining sup- 
ports. For / G Eaup define the support sp(f) and degree of support dsp{f) by 



svU) 

dsp{f) 



{a G Ca ■■ a,S \= f, a,S^±} = qs{f) - qs{±), 

_ pjspjf)) _ p{qsU))-p(Qs{±)) 
l-p(qs(±)) l-p(qs(±)) 



We refer to (Kohlas et al., 1998, 2000) for further discussion of the interpretation 
of supports and degrees of supports. As above with qs, we often consider the sets 
sp{f) as DNFs. 

For the degree of support, Pa = 0 implies dsp{a) = 0 and Pa = 1 implies 
dsp{a) = 1, but dsp{a) does not linearly depend on pa- Again, we try to separate 
the probability p{a) of a from the rest of the information, i.e. using definitions 
from above we get 

p(g V gs(T)) -p(gs(T)) ^ Pa - pap{qsg{l.)) 

1 -p(gs(T)) I - PaP{qSa{E)) - {I - Pa)p{qS^a{E)Y 



dsp{a) 
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Using an analogue idea as in the previous section, we can define the degree of 
support of a as a function of the probability Pa- 



9a{Pa) 



0 

1 -P(gsa(-L)) 

^(l-p(9S-a(-L))-p(9Sa(-L))+p(9S^a(-L)) 



if Pa = 0, 
otherwise. 



( 3 ) 



In the sequel, it is often more convenient to consider ga as a function of three 
arguments, i.e. we define the function 7 as 



l{x,Pi,P2) 



0 

i-pi 

5(1-P2)-Pl+P2 



if X = 0, 
otherwise 



( 4 ) 



for every 0 < pi,P 2 < 1-^ So we have ga{Pa) = 'y{pa,p{qSa{-L)),p{qs^a{-L))). 

Lemma 1. In the interval [0, 1] the function ga is symmetric with respect to the 
function x 1 - x, i.e. ga{l - ga{Pa)) = ^ ~ Pa for Pa € [0, 1]. 

In the interval ]0, 1[ the function ga is monotone strict increasing. 

For proofs see (Anrig, 2001). 



Let us now consider the situation, where two different assumptions a and b 
and their corresponding functions ga and gt are given. 



Lemma 2. Given two functions ga and gt as above, if there exists a point x € 
]0, 1[ for which ga{x) > gb{x), then ga > gb in, the interval ]0, 1[. 

In other words, there is a complete order on the set of functions {ga ■ o, £ A\ 
in the interval ]0, 1[. This order will be formalized in the following. 



Example 3. (Cont. of Ex. 2) 

We compute the five func- 
tions gor\ j 9oV2 j 9or^ 7 9xoV2 j 
and gxor 2 using (2); this means 
that we have to compute the 
graph of 7 for five different 
situations, so for example for 
xori we have gxoniPxon) = 
7(P2;ori, 0.9463,0). See Fig. 2 for 
the graphs of these functions. 0 




Fig. 2. {gxor 2 and Qor^ are nearly equal). 



We will characterize in the sequel every function ga by its maximal distance 
to the function x i— x, i.e. we define 



da ■= 



max d 






or, using the definition of 7 in (4) for 7(-,pi,P2) = 9a{'), 

,X + j{x,Pi,P2) X + j{x,Pi,P2) 



'ipi,P2 := max d (x,7(x,pi,p2)), (- 
0<rc<l 



where d is the usual euclidean distance function. The function x ^ Xj which is 
equal to '){-,p,p) for any 0 < p < 1, has clearly distance 0. 

^ The case pi = 1 or p 2 = 1 is excluded here. 
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Lemma 3. Let 0 < pi,p 2 < 1- Then 

dpi,p 2 = rf((0.5, 0.5), (cc, 1 - a;)) 
where x is the unique solution of ^(x,pi,p 2 ) = ^ 

0.5 

l-P2-\/ (1-Pl)(l-P2) 

P1-P2 

The following lemma shows that this is indeed a unique characterization up 
to equivalence: 

Lemma 4. The mapping j{-,Pi,P 2 ) '— >■ <^pi,p 2 

— injective in the sense that for every pair of functions 'y{-,Pi,P 2 ), l{'^p'\^p' 2 ) 

which satisfies j{x,pi,p 2 ) yf j{x,p'i,P 2 ) for some x € [0, 1], we have yf 

dp'^ ,P2 ^'^d 

— surjective with respect to the interval [0, ^V^[, i-e. for every a € [0, \'J^\ 
there are parameters 0 <Pi,P 2 < 1 so that dp-^^p^ = a. 

Corollary 1. For 0 < pi,P 2 < 1, there is a 0 < p < 1 so that dp^^p^ = dp^. 

For a € A, we have dsp{a) + dsp{~<a) = 1, which is a standard result 
of probabilistic argumentation systems (Kohlas et al., 1998). This implies that 
9a{x) + 5-,a(l — a^) = 1, which means that there is a symmetry between ga and 
g^ci with respect to the function x i— >■ x. So the distance depends in fact on the 
assumption and not on the literal. Therefore, we can speak of the distance da of 
an assumption a: 

Lemma 5. For every 0 < Pi,P 2 < 1> we have dp^^p^ = dp^^p^. For every a & A 
we have da = d^a- 

Lemma 4 states that for a given d there are pi and p 2 with dp^^p^ = d, but 
there is no unique solution. Nevertheless, Corollary 1 states that for every pair 
(pi,P 2 ), there is a uniquely defined p so that dpi,p2 = dp,o, be. we have a normal 
form. Lemma 5 shows that in this case we even have dpi,p2 = ^p,o = do,p. This 
is recapitulated in the following corollary: 

Corollary 2. For 0 < pi,p 2 < 1 there is a uniquely defined parameter p with 
0 < p < 1 so that dpj^^p^ = dpfi = do.p- 

This corollary implies that the parameter space [0, l[x [0, 1[ has a class struc- 
ture, i.e. every function 7(-,pi,P2) can be represented by a parameter p G [0, 1[. 

Consider an assumption a € A with probability Pa = p{a) and the function 
^{Pa,P,p) for a fixed parameter 0 < p < 1. This function represents a degree of 
support which depends only on the value Pa but not on p{qs{l.)), because, for 
X yf 0, clearly 7(x,p,p) = x, and using the definition of 7, also 7(0, p,p) = 0. 
This means that this assumption a does not influence the probability of the 




= |x- 0.51^2 

1 — X m the interval ]0, 1[, i.e. 



if Pi =P2 
otherwise 



(5) 
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conflicts p{qs{-L)) or of the diagnoses p(sp(T)) at all. Furthermore, the degree 
of support of any formula which consists of assumptions in A — {a} does not 
depend on pa- The distance da = of this variable is zero. Therefore we say 
that this assumption a is unimportant to the present situation. The larger the 
distance da of an assumption a is, the more influence has the probability of this 
assumption on the probability of the conflicts p(<7s(T)) (and therefore also the 
probability of the diagnoses). So we define the (5-importance as a normalization 
of the distance d to the interval [0, 1]: 

Definition 2. For a G A we define its 5-importance by I^{a) = daV^- 

Example 4- (Continuation of Example 3) For the arithmetical network, we get 
the following (5-importances using the results from (2): 



Component 


XOTi XOV2 


or\ OV 2 


ors 


W) 


0.6236 0.1078 


0.6894 0.1618 


0.1106 



This result is qualitatively equal to the one computed in Example 2. 0 



4 Importance Measures from Reliability Theory 

4.1 Structure Ftinctions versus Conflicts 

In reliability theory, the concept of structure function is fundamental for impor- 
tance measures. A structure function (/? is a binary function which describes the 
state Zs of a system in dependence of the states Zi of the elements (or compo- 
nents) i = 1, . . . ,n, i.e. Zs = ip{zi, . . . , Zn)- The structure function is therefore a 
logical description of all system states (zi, . . . , z„) G {0, 1}" for which the system 
is functioning, i.e. for which p{zi, ... , z^) = 1. Different methods for computing 
this function are known, see (Kohlas, 1987, Beichelt, 1993) for further literature. 

A basic problem in reliability theory is to transform the structure function in 
a so-called disjoint form so that computing the reliability of the system becomes 
possible. This is very similar to the computations of probabilities of support in 
our context (cf. Section 1). 

Consider the simple device in Fig. 3. The 
system consists of an input i, a component 
Cl connected serially with two components 
62 and 63 connected in parallel, and an out- 
put o. The connection between in- and out- 
put is established if Ci and either 62 or 63 are 
working. Therefore, the structure function is 
i^(zi,Z2,Z3) = zi A (z2 V Z3), where Zi denotes the state of component e^. Note 
that this structure function includes already the desired behaviour, namely that 
the input i is connected with the output o. 

The same example, modeled in our logical framework reads as follows: we 
have three assumptions, i.e A = {ofci, 0^2, 0^3} and three propositions P = 




Fig. 3. A simple device 
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{i,o,x}, where the assumption oki is true if the component e* is functioning 
(i.e. if Zi = 1), and S = {ofci — >■ (t o x), 0^2 —>• (x <->• o), ok^ — >■ (x O o)}. 
Now, this modeling does not contain the desired behavior, i.e. i o, because in 
our diagnostic environment, we are a priori interested not only in this problem. 
Nevertheless, this information can be taken into consideration in our approach. 
We just add the negation of this information to the knowledge base, i.e. we define 
the updated knowledge base by S' = if U {-i(z O o)} and then, the diagnoses 
w.r.t S' are just the same as the structure function tp above, i.e. 

sp(T, S') = {oki A 0 ^ 2 ) V {oki A ofcs). 

Note that qs{i o, S) = sp(T, S') allows another viewpoint of this situation. 

So our framework of assumption-based reasoning is closely related to the 
framework of reliability theory and we can use our algorithms to compute struc- 
ture functions, see (Kohlas et al., 2000) for an overview and (Haenni, 2001) 
for new approximation techniques. Further, the implementation of ABEL 
(Haenni et al., 2000) is capable of computing structure functions and there- 
fore also the reliability of a system. For some further discussion and literature 
about the connections between model-based diagnosis and reliability theory see 
(Anrig, 2001). 

4.2 Importance Measures in Reliability Theory 

Importance measures are a well-known concept in reliability theory, where im- 
portance is understood as a measure of the criticality of a component. Several 
questions are addressed; for example (following Beichelt, 1993): 

1. Which impact has a component on the reliability of a system? 

2. How does the reliability of the system depend on the reliab. of a component? 

3. Which component has the highest probab. to cause a failure of the system? 

4. Which components have the highest probab. to cause a failure of the system? 
In Section 4.3, we introduce the Birnbaum-Importance which gives answers for 
the first two questions. Questions 3 and 4 are not treated here, because their 
answer is given by what is known as degrees of support for components in 
our approach, yet see also (Kohlas et al., 2001). In reliability theory, these two 
questions are usually answered by the Fussel-Vesley-Importance (Vesley, 1970, 
Viswanadham et al., 1987). The idea of this importance measure is quite close to 
a degree of support, cf. Section 4.4. If no probabilities are known, then the struc- 
tural importance can be used (Anrig, 2001). Note that in contrast to reliability 
theory, we will consider importances here independent of a time parameter. 

4.3 Birnbaum-Importance 

The Birnbaum-Importance was introduced in (Birnbaum, 1969); it is also called 
reliability importance (Barlow & Proschan, 1975). We follow here the descrip- 
tion in (Beichelt, 1993). The Birnbaum-Importance of an element is defined 
by 

^^(p) 
dpi 



I(hP) 



( 6 ) 
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where h is the structure function, /i(p) the availability of the system depending 
on the vector p = {pi,p 2 , . . . ,pn) of the probabilities of the element. This can 
also be written as /(z,p) = p)) — /i((0i, p)), where (xi,p) denotes the 

vector p, where the z-th place is replaced by x. 

This formula can be used in our context, where the structure function h 
is represented by the diagnoses sp(T) = -igs(T) and /i(p) by p(sp(T)) = 1 — 
p((/s(T)) with respect to the extended knowledge base (cf. Section 4.1). This 
implies that p)) and /i((0i, p)) are computed in our context by 1— p(gSa(-L)) 

and 1 — p(gs-,a(T)) respectively. 



Definition 3. The Birnhaum-importance of an assumption a € A is 



I^{a) 



dpjspjT)) 

dp{a) 



Ip(9So(-L)) -p(gs^o(-L))| . 



Note that in the original definition (6), there is no need to take the abso- 
lute value because in reliability theory, the function h is monotone, but in the 
more general framework of assumption-based reasoning, this monotonicity prop- 
erty does not hold anymore. Nevertheless, considering that assumptions in our 
framework are a generalization of variables in reliability theory, it makes sense 
to take the absolute value, which implies the nice property I® (a) = I® (-•a) so 
that we can talk of the Birnhaum-importance of the assumption a. 

The idea presented here is just the D-importance introduced in Section 2: 

Lemma 6. /°(a) = I^{a) for every a G A. 



4.4 Fussel-Vesley-Importance 

The Fussel-Vesley-Importance was introduced by Vesley (1970) and used by 
Fussel (1973) in the context of fault trees. This importance measure depends 
on basic events, which, in our framework can be interpreted as literals of as- 
sumptions. Here we will not consider the time dependence of the importance, so 
for a basic event Bi, according to (Viswanadham et al, 1987), the Fussel- Vesley 
is I^^{Bi) = Q{Bi)/Qs, where Q{Bi) denotes the probability that the structure 
function for the union of the cut sets containing Bi has value 1 and Qs is the 
probability of a system failure. 

In our framework, we will omit the factor Qs because we only compare items 
with respect to the same knowledge, i.e. in a given situation the probability of a 
system failure is constant. The Fussel- Wesley importance can thus be defined in 
our framework by (a) = dsp{a) but note that in general /^^(-lo) is different 
from /^^(a), i.e. we cannot speak of the importance of the assumption a. In 
reliability theory and especially in the framework of failure trees, where this 
importance measure was used by Fussel, the formulas are monotone, i.e. they 
contain only positive literals. Hence, there is no need to consider negated literals 
or their importance. Yet in the present more general context, we are mainly 
interested in an importance value defined with respect to the variable (and not 
to the literal). Clearly, in our framework we have dsp{a) dsp{->a) = 1 because a 
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is an assumption, and therefore = 1 — But this does not tell us 

which one of these values we should take into consideration for the computation 
of the importance of a and we do not know how to combine these two pieces of 
information. So we will not consider this importance measure in the sequel. 

5 Computation of Importances 

For an assumption a € A, the basic probabilities which have to be computed are 
gSa(-L) and (/s-,a(-L). In general, there are two situations to consider: 

If the (minimal) conflicts 9 s(-L) are known, the probabilities we are inter- 
ested in can be computed as follows: first, an equivalent, disjoint form of the 
formula qs{-L) is computed using the well-known inclusion-exclusion formula for 
computing probabilities of unions of events in probability theory or one of the 
much more efficient algorithms using disjoint forms (Kohlas & Monney, 1995, 
Bertschy & Monney, 1996). So given the conflicts (/s(-L), these algorithms com- 
pute a set of formulas C so that gs(-L) = ^C, where the sum denotes the 
disjoint union. Then for a G A (and analogously for -■a), 

p(gSa(-L)) =T 

\ceC ) ceC 

where Co denotes the formula c in which a is instantiated by T. When using one 
of the efficient algorithms mentioned above, then the formulas c have a simple 
form and the computation of p{ca) can be done quite easily. This process is 
also called literal-conjoining (Darwiche, 2001) and can be applied to any logical 
structure for computing probabilities of formulas when a variable is instantiated. 

If the conflicts ?s(T) have not yet been computed, there is another 
possibility to compute the probabilities p{qsa{-i-)). Generalizing concepts of 
Lauritzen & Spiegelhalter (1988), Shenoy & Shafer developed a concept called 
Computation in Hypertrees (or Valuation Networks) for local computation 
(Shenoy & Shafer, 1990, Lauritzen & Shenoy, 1995). This concept has been used 
for several formalisms and here, we are especially interested in its application to 
belief function propagation. This subject and corresponding algorithms are in 
detail presented in (Lehmann, 2001) and we only outline the idea here. Consider 
the knowledge base H in the language Caup- Now, a partition of this knowledge 
base is computed according to the assumptions A, i.e. the knowledge base is di- 
vided into several parts Si, ... , Si so that Si G Sp^uAi and AiCAj = 0 for every 
i ^ j with Pi C P and Ai Q A. There are two main goals in this process of par- 
titioning: first, the sets Ai should be small in order to keep the reasoning process 
in Lp-ijAi feasible. Second, a “good” hypertree should be computable from the 
sets Pi, ... , Pi- Consider now the case where the set {Pi,. . . ,Pi} is already a 
hypertree construction sequence (for the more general case see Lehmann, 2001). 
Then the knowledge Si can be transformed into a belief function beli and these 
belief functions are propagated on the hypertree according to (Lehmann, 2001). 
This scheme allows efficient computation of p(gSa(-L)) and p{qs^a{S)) for a G A 
by instantiating locally a to T or T respectively. 
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Approximation techniques allow to determine upper and lower bounds for 
the beliefs we have to compute (Haenni & Lehmann, 2001, Lehmann, 2001). 
And clearly, for computing importances, approximated values of these beliefs 
are usually sufficient and therefore their computation quite fast. 

6 Conclusions 

The term importance is well-known in reliability theory. In this paper, we have in- 
troduced an analogue concept for the framework of assumption-based reasoning. 
Two different importance measures appear to be interesting for us: D-importance 
and i5-importance. The former is equivalent to the so-called Birnbaum-import- 
ance from reliability theory, whereas the latter is new and essentially based on 
supporting arguments. One has to be aware of the fact that the concept of im- 
portance of an assumption is different from the ideas of expressing the gain of 
testing the actual value of the assumption and also from the concept of likeliness 
of the assumption 

A first conjecture was that in monotone situations, (5-importance and Birn- 
baum-importance yield qualitatively equal results, but this is not true. Several 
examples (see Anrig, 2001) show that the results of the two importance mea- 
sures agree quite often qualitatively, but if they disagree, the new concept of 
(5-importance seems to reflect the “right” concept in our framework. 

Consider the case where we are interested in symbolic results of the computa- 
tion, for example the most important arguments for a given hypothesis. Further 
work has to be done in the direction of focusing the symbolic computation on 
important assumptions. This can be done by first computing numerically the 
importances of the assumptions and then focus the symbolic computation on 
the important assumptions using cost-bounded argumentation (Haenni, 2001). 
However, the integration of importance measures and approximation as well as 
the connections between reliability and probabilistic argumentation systems are 
subject to further research. 



Acknowledgments. The author thanks the anonymous referees as well as Jiirg 
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Abstract. We present a method to tackle indirect effects in the context of 
reasoning about actions. The framework is based on the normative theory of 
causality. The method we propose, is close to Thielscher’s method. To 
represent ramifications, we use directed relations between two single effects 
called indirect effects relationships, stating under which conditions the 
occurrence of the second effect follows the occurrence of the first. Our 
relationships represent both instantaneous and delayed indirect effects. 



1. Introduction 

The ramification problem has aroused interest of researchers community in the earlier 
90’s [2], [3], [6], [7], [10], [12]. It was defined, in the context of reasoning about 
actions, as the problem of describing all the indirect effects of actions. By indirect 
effects we mean the overflow effects that appear after performing an action. As an 
example, we consider the action “to win in lotto” which has as direct effect the fact 
“to become rich”. This effect raises other (indirect) effects like “move to another 
house”, “buy a new car”, “pay more taxes”, etc. To handle this kind of effects, many 
methods were proposed in the literature. Most of them used the domain constraints 
which are formulas that represent static dependencies existing between world 
components. These formulas are verified in every valid state of the world. The use of 
domain constraints is not sufficient to generate the expected indirect effects, as it was 
shown in [6], [7], and [12]. 

In this paper, we deal with the ramification problem in the framework of the 
normative method of causality [5], [9]. The solution we propose to compute all the 
indirect effects of actions uses relations called “indirect effect relationships”. These 
relationships are inspired by Thielscher’s causal relationships [12]. As for the former, 
indirect effect relationships are systematically generated from domain constraints and 
influence information. However, our method differ from Thielscher’s one in that it 
handles delayed indirect effects. Moreover, indirect effects relationships are not 
applied randomly but according to the order of effects generation. This order allows 
to handle correctly examples like the stain cloth-table one [13]. 

In the next section, we give briefly fundamentals of normative method of causality, 
on which is based our framework. We then present our method to handle 
ramifications. Thereafter, we give the systematic generation algorithm of indirect 
effect relationships. We conclude with a brief related work and a summary. 



S. Benferhat and P. Besnard (Eds.): ECSQARU 2001, LNAI 2143, pp. 704-713, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 
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2. Normative Method of Causality Basics 

The normative method of causality [5], [9] is based on an interventionist concept of 
causality where an agent has the choice to perform or not an action (free will). It is 
based also, on the principle which stipulates that “an action may cause one or many 
effects”. We introduce some definitions.* 

Time has been explicitly defined by means of time points. 

A time point is defined by a subset of propositions and by a date. A time point is a 
universe snapshot where the propositions of the point are true at the date of the 
snapshot. Let T be the set of all time points. 

The date of a time point is defined by the following function : 
date : T R, where date{t) = d (noted d) means that d is the date of the time point t. 

A time line is a set of time points which are in bijection with a set of dates. It 
represents a possible evolution of universe. Let L be the set of all time lines. 

The time points of a time line are totally ordered by the precedence relation noted 
" ", where t, L means that t^ doesn’t precede t,. Whenever t, L we have d,, <7,2- 

The precedence relation expresses the principle “no effect precedes its cause”. 

The representation of the notion of free will required a structure of time with a 
branching in the future. A branching in the past has been also required to examine 
different courses of events leading to the same situation. 

Two time lines Z, and coincide until a time point t (noted coincided, ,l^,t) ) iff Vt’ 
t ,t’ /, f 

The set of preferred time lines for line I at time point t (noted Lp(l,d)) is defined as 
a function : 

Lp:L R t 

such that : V/’(Z’ Lp(l,d) coincide{lf\t)). 

The proposed language is defined on two levels : 

1. The first level represents static information. It’s a plain propositional language in 
which : T" is a set of propositions we are interested in, A is a set of actions, £ is a 
set of effects (facts, events, ...), with A E= and£ = A E. 

A first level formula is either an action formula, or an effect formula. 

Let FOR(A) (respectively FOR{E)) be the set of action (respectively effect) formulas. 
An effect literal is either an effect, or an effect negation. The set of effect literal is 
noted L/T(£). It is defined as: L/r(£) = £ { e:e £}. 

2. The second expresses dynamic information represented by formulas of the form: 

v(p,l,d), which means that formula p is true in time line Z at date d. 

Causality is expressed by “normal causality” operator, noted " ". a e[A] 

expresses that action a normally implies effect e in the delay A, unless there is an 
occurrence of an event inhibiting the effect e. The formalisation of such notion needs 
nonmonotonic reasoning which is expressed by means of action norm and inhibiting 
events. 

Action norm is defined as the set of propositions which must normally be true in 
order to perform the action. Formally, the norm is defined as a function : 



* More details can be found in [5] and [9]. 
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. T FOR(£) 

norm . A L 

where norm(a) contains the qualifications of action a. 

The set of the external events that are susceptible to inhibit the effect of the action 
a is defined as a function : 

inhibit : L1T(E) A 2“'^^. 
where e ’ inhibit(e,a) iff e ’ inhibits the effect e of action a. 



3. Ramification in Normative Method of Causality 

In this section, we present an approach to tackle action ramifications in the normative 
method of causality. To compute indirect effects, we introduce directed relations 
between two single effects called indirect effect relationships. Before defining these 
relationships, we augment the language of the normative method of causality with a 
new predicate : 

occ{e,l,d), with the intented meaning that the effect e has been generated in time 
line I at the date d. We have : V ed,d : occ(e,l,d) v(e,l,d). 

After defining occ, we define formally normal implication. 

Definition 1. Action a implies normally effect e (e LIT{E)), in a delay A (noted a 
e[A]) iff :{'il,t) it I (Cl C2)) 
where C 1 and C2 are abbreviation of the following conditions : 

Cl { [v(fl,W,) Vp(p normia) v(p,l,d))] i^lW Lp{l,d) Cll C12)} 
with : Cll {(3t’) (f’ V d, d,. d,+A occ(e,Z’,4))} 

C12 {(Be’, t") (e’ inhibit(e,a) t" I’ v(e’,l’,d,.i) d, d,. d, + A)} 

C2{v( a,l,d) (BIW Lp(l,d) (id) id, d d,+ A vie,l’,d)))} 

Cl expresses that whenever a is performed under normal conditions (i.e., all the 
propositions in norm{a) are true), (Cll) either e occurs in all preferred futures during 
the delay A, (Cl 2) or there has been an event e’ after t and within delay A which has 
inhibited effect e of action a. C2 means that if a is not performed, there exists at least 
a preferred future where e does not occur during the delay A. 

Definition 2. Rules a e[A] is called causal rule. Causal rules are gathered in a rule 
base called BR. 

We introduce a new operator called “indirect implication” operator, denoted “i-a”. 
“(e, , e)) e [A]” expresses that the occurrence of effect e,, in a situation in which 
formula e^ is true, indirectly implies the occurrence of effect e within a delay A. We 
define formally our new implication as follows. 

Definition 3. Effect e, indirectly implies effect e in a situation in which formula e^ is 
true, within a delay A (noted (e, , e)\-A e [A]) iff 

(V/, d) (occie,,l,d) vie^hd) (3 d’) (d d’ d+A occ(e,l,d’))) 
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Definition 4. An indirect effect relationship is an expression of the form : 

(e, ,e^)^e [A] 

where e^ and e are effect literals, e^ is an effect formula, and zl is a real. 

Indirect effect relationships are gathered in a relationships base noted BEL 
If the delay A = 0, the generated indirect effect e is said instantaneous. 

Example 5. In a tennis match, if a player fails his service two time in succession, he 
loses the point. 

Consider the facts ServicelLost, ServicelLost and PointLost. The game rule can be 
expressed by the following indirect effect relationship : 

(ServicelLost, ServicelLost) PointLost 

which means that the fact failing in the second service implies losing the point, if the 
first service was lost too. 

Before computing actions effects, we need to define the notion of effect persistence 
which is defined as a duration which is independent of e’s change causes. 

Definition 6. Persist(e,S), where e LIT(E)^ and S K, means that the effect e 
normally persists during the delay 5. 

This definition is not sufficient to deduce the truth value of the effect e during the 
delay <5 To do it, we need further information which describe time lines. 

Let DL be the set of formulas of the form v(e,l,d) or occ(e,l,d) which describe time 
lines. DL represents the set of world observations at particular moments. 

We introduce a new function called Closure, which generates the closure of a set of 
formulas of the form v(e,l,d) or occ(e,l,d), by exploiting persistence delays. 

Let D = {v(e,l,d) : e EOR(E) I L d ff} and 
E = {occ(e,l,d) : e FOR(E) I L d R}. 

Definition 7. The closure of a set of observations DL is defined by the function ; 

Closure ” 2° , 

LetDL 2° ^ be a set of observations. The set CZo^Mre(£)L) is defined by: 

1. Ve LIT(E), I L, d R : v(e,l,d) Closure(DL) iff one of the following 
conditions is verified : 

i. v(e,l,d) DL, or 

ii. 3d’,8{8>d-d’ v(e,l,d’) Closure(DL) Persist(e,8) 

\/d"(d’ d"<d v( e,l,d”) Closure(DL)) ], 

iii. 3l’(l’ L v(e,P,d) Closure(DL) Coincide(l,l’,d)), 

iv. occ(e,l,d) DL. 

2. Ve FOR(E), I L,d R:v(e,l,d) Closure(DL) if v(e, I, d) DL, 

3. V e,, e^ FOR(E), I L, d /?: (vfe, e^, I , d) Closure(DL)) if 
(v(e,,l , d) Closure(DE)) and ^ (yie^, I , d) Closure(DL)). 



2 



Persistence duration of the effect e is different from e’s one. For example, the effect 
Alive persists indefinitely whereas the effect Alive persists during a limited period. 




708 



M. Khelfallah and A. Mokhtari 



Point (1) treats only the case of effect literal. Intuitively, condition (Li) means that if 
an effect is true in DL, it is also in Closure{DL). (l.ii) means that if an effect is true in 
DL, it will he true in Closure(DL), during its persistence delay, unless its opposite 
occurs, (l.iii) copies out line contents on all the lines that coincide with it. (l.iv) uses 
relation between occ and v. 

Points (2) and (3) treat the case of effect formulas. Point (2) copies out DL contents, 
and (3) closes Closure(DL) by using the definition of the and operator. 

This function allows having a complete knowledge of the world from a set of 
observations. Intuitively, the set DL contains relevant information and Closure{DL) 
represents all the information we can deduce from DL by exploiting persistence 
delays. 

V e FOR(E), I L, d S’ : e is true in the time line / at the date d, in a world 
described by the observations set DL iff v(e,l,d) Closure(DL). 

The function Closure defined, we have to define DL the set of time lines 
descriptions. Computing DL depends on executed actions, because the latter are the 
only things that can change world state. 

We are interested in computing all the (direct or indirect) effects of an action a 
which is performed in a state described by a set of observations DL. Action a is 
accomplished in the time line I at the date d. The (direct or indirect) generated effects 
are gathered in the set EG. 



Definition 8. Execution context of an action a is defined by the tuple <DL, I, d, EG> 
where : 

• DL a the set of formulas v(e,l,d) : observations on the state in which action a 
is performed, 

• I L, is the time line in which a is performed, 

• d ff, is the date at which a is performed, 

• EG is a set of formulas of the form occ(e,l,d) which represent the generated 
effects in the time line /, at the date d. 

Informally, the generation of action a effects consists in applying sequentially 
causal rules and indirect effects relationships associated to a. The order of rules or 
relationships application is not arbitrary. It depends on the effects generation date, i.e., 
a rule or a relationship is applied only if all the effects, which should occur before its 
effect, were generated. According to this criteria, effects are generated in the exact 
order of their appearance. The generation stops if no new effect is generated. 

We define formally the notion of causal rule and indirect effect relationship 
applicability. 



^ Formulas including other logical operators can be transformed into formulas containing only 
and operators. 

* In the particular case where two mles generate effects at the same moment, they are applied 
sequentially without any priority (nondeterministic order). We suppose that the generated 
effects are not contradictory. This hypothesis is necessary to guarantee the safeness of the 
set of generated effects. 
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Definition 9. The rule a e [A] is applicable in the context < DL , I , d , EG > with 
the appearance date of the effect e iff the following conditions are verified : 

1. v(a,l,d), 

2. d^ = d + A, 

3. V/7 (/7 Norm(a) v(p,l,d) Closure(DL EG)), 

4. (Ve’)(e’ Inhibit(e,a) (V<f ) (d d’ d+A : v(e',l,d’) Closure(DL EG))), 

5. (a e"[A]) BR {respectively, ((e, , e^) i— ^ e’[A]) BEI] applicable in the 

context < DL , I , d , EG > with d' is the appearance date of the effect e’ 
d’ d, 

6. this rule was not already applied in the same context. 

(1) means that action a occurs in the time line I at the date d, 

(2) means that the effect e occurs at the least at the date d + A, 

(3) means that all a’s preconditions, listed in Norm(a), are verified in the context, 

(4) expresses that inhibiting events of effect e must not occur in line I from a’s 
execution date during the delay A. 

(5) expresses that all effects appearing before the effect e, were generated (by 
applying the associated rules or relationships), and 

(6) means that this causal rule is applied at most once for each execution of action a. 

Definition 10. The relationship {e, , e^) ++ e [A] is applicable in the context 
< DL , I , d , EG > with d^ the appearance date of the effect e iff the following 
conditions are verified : 

1. (\/r,d')(r Lp(l,d) d d’ occ{e,l’,d’) EG), 

2. (3 r,d%l’ Lp(l,d) d d’ occ(e,,l’,d’) EG vie^,l’,d’) ClosureiDL EG)) 

3. d^ = d + A. 

4. V (a e’[A]) BR (respectively, ((e, , e^) i— ^ e’[A]) BEI} applicable in the 
context < DL , I , d , EG > with d’ is the appearance date of the effect e’ 

d’ d. 

Condition (1) checks if the effect has been already generated. This condition 
guarantees also that the indirect effects relationship is applied at most once for each 
execution of action a. (2) checks that the effect e, has been generated and e^ is true in 
the context. (3) means that the effect e occurs after a maximum delay A. 

Example 11. [12] We have a circuit with 3 switches {Sw,, Sw^ Sw^), a light (Light), a 
relay (Relay), and a light detector (Detect). The light is activated iff the first and the 
second switches are closed. The relay is activated only if the first and the third switch 
are closed. Moreover, if the relay is activated, the second switch is opened. The 
detector is activated if the light is activated. ^ 

Indirect effects are represented by the following relationships. 



^ We suppose that the light and the detector are faster than the relay. Other cases can be easily 
considered, such the case where the relay is first activated (before the light), . . . 

® We give only the relationships we are interested in. 
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(Sw,, SWj) HA Light [Is] 


(1) 


( Sw„,Tme)'r^ Light [\s] 


(2) 


(5^1, ^Wj) HA Relay [2s] 


(3) 


(Relay, True) ha Sw^ [2s] 


(4) 


(Light, True) ha Detect [Is] 


(5) 



Action which closes the first switch is represented by the rule Close j SWj [Is]. 

We close the first switch in a state where both second and third switches are closed 
and light, relay and detector are deactivated, i.e.. Close, is executed in the context 
< DL , I , d , > where DL = {v( Sw,,l,d), v(Sw^,l,d), v(Sw^l,d), v( Light, I, d), 

v( Relay,l,d), v( Detect,l,d)}. 

The first rule to be applied is the causal rule. It generates occ(Sw,, l,d + Is).’ 

The next to be applied is the relationship (1) because its effect. Light, is generated 
before the effects of the other relationships. The generated effect is 
occ( Light, I, d+ 2s). 

(3) and (5) are both applicable. So the choice to apply one before the other is 
completely nondeterministic. 

We choose to apply first relationship (3) which generates occ(Relay, I, d + 3s). Then 
relationship (5) which generate occ(Detect, l,d+ 3s). 

Thereafter, relationships (4) and (2) are applied in this order to generate respectively 
occ( 5^2, /, (i + 5s) and occ( Light, I, d + 6s). 

In the (deterministic) obtained state, the light detector is activated although there is 
no light. 



4. Systematic Generation of Indirect Effect Relationships 

Our method computes successfully action ramifications (i.e., does not generate 
unexpected effects) thanks to an adequate set of indirect effect relationships. The 
latter are not written by a reasoning system designer but they are generated 
automatically from domain constraints and influence information which is useful to 
limit the changes deduced from domain constraints to the desired ones. 

Domain constraints are effect formulas which represent static dependencies that 
exist between domain effects. In addition to the fact that they are concise, they are 
written in a natural and easy manner. Domain constraints are gathered in a set noted 
CD. 



’ We suppose that effects appear at the date d + delay. 
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Definition 12. Influence information is defined on LIT(E) LIT(E) R, where 
(Cj, e^, A) I means that the change of e/s value potentially changes e/s value, in a 
maximum delay A. The set of all influence information is noted I. 

Indirect Effect Relationships Automatic Generation Algorithm 

Input : the set of influence information I ; 

disjunctive normal form D, ... of the conjunction of domain constraints of 
CD, CNE(CD). 

Output : the set BEI of indirect effect relationships 

BEI ■.= {}■ 

For each constraint D. = e, ... e^. CNF(CD), i = 1 , n 

For each j = 1, ..., m. 

For each k = 1, m^, k j 

If ( ej,e^,A) /, add to B£7 the relationship : ( Cj , e, )\-^e^.[A]. 

1=1,.. .mi 

I jj k 

Actually, this algorithm is an extension of the Thielscher’s algorithm for automatic 
generation of causal relationships [12]. 

First, the extension consists in introducing the notion of delay in influence 
information which allows us to handle delayed indirect effects. 

The second extension is that influence information is defined on the set of effect 
literals instead of the set of effects. This extension allows us to deal with examples 
like Toulouse’s suitcase [1]. 

Example 13. Toulouse’s suitcase is an enhancement of Lin’s suitcase [6] which is 
opened if and only if its two latches are opened. In addition, there is an automatic 
mechanism which ties the two latches, where the opening of the first leads to the 
second’ s one. 

We have the following domain constraints: 

Lj Open, 

L, L, 

I = { {Lj,Open,2s), { L^, Open,2s), (L^,Open,2s), ( L^, Open,2^), {Lj,L^,\&)}. 

From domain constraints and the set of influence information I, the algorithm 
described above generates intuitive indirect effect relationships. In particular, the 
undesirable relationship (generated in Thielscher’s method) “(F;, Open) i— ^ [Is]” 

is not generated. It is due to the absence of the influence information (L,, 
because L, cannot close L^. 



5. Related Work 

We have presented a method to compute action ramifications. To this purpose, we 
introduced indirect effect relationships which are inspired from Thielscher’s causal 
relationships [12]. However, our method is more general than the latter. In particular, 
in handling delayed effects. 
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Influence information we used in the systematic generation of indirect effect 
relationships, is defined on the set of effect literal, whereas Thielscher’s one is 
defined on the set of effects. This difference allows to tackle correctly the Toulouse’s 
suitcase example [1]. 

By imposing a priority in the effects generation, our method gets closer to the 
method of Van Belleghem et al. [14] which is based in induction principle. It 
computes action effects in their appearance order by stratifying (direct and indirect) 
effect rules. An effect is generated only if all the effects of the previous stratum are 
computed. The equivalent rule of this law in our approach is the priority imposed on 
causal rules and indirect effect relationships application. This priority is based on 
effect appearance order. 

Like the method £R [14], many other methods [8], [11] are based on induction 
principle. Nevertheless, the former requires syntactical restrictions to guarantee the 
stratification. 

Shanahan [10] has introduced in a natural way ramifications in event calculus. In 
spite of the likeness of his method with [14]’s one, it is less general, because it does 
not handle mutually dependant effects. 

Delayed effects in [4] are handled in a different manner than here. They are 
represented by means of direct effects and causal rules which provide indirect effects. 

Generally, our method tackles concurrent actions without any extension. The only 
condition to guarantee the consistency of the obtained results is the non existence of 
actions with opposite effects executed simultaneously in the same time line. No 
extension is needed, due to the priority we impose in applying rules and relationships. 



6. Conclusion 

We have presented a method to tackle indirect effects of actions in the context of the 
normative method of causality. To this end, we have introduced directed relations 
defined between effect literals, called indirect effect relationships. 

As is suggested by their name, indirect effect relationships are used to generate 
indirect effects of actions. These effects are generally delayed ones, i.e., their 
occurrence is not necessarily instantaneous. To represent this property, indirect effect 
relationships are of the form : (e, , e^)-^ e [A], which means that the occurrence of the 
effect e,, in a situation in which the formula e^ is true, provokes the occurrence of the 
effect c in a maximum delay A. 

We propose an algorithm which permits to systematically generate such 
relationships from domain constraints and further information called influence 
information. 

For lack of space, we have not addressed an important part of our method. It 
concerns implicit qualifications and incomplete descriptions of the world. Implicit 
qualifications are deduced by exploiting domain constraints as qualification 
constraints [7], [12]. Domain constraints are also used to complete the set of 
observations about the world as it is done in [3]. 

We handle neither actions with non deterministic effects nor complex actions. A 
subset of concurrent actions is tackled. It is the set of actions with independent 
effects. 
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Thus, we plan to extend our method to handle a larger domain, which could 
contain non-deterministic, concurrent actions and continuous change. 
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Abstract. Existing methods for representing conflicting simultaneous 
events employ the notion of cancellations; which are used in order to 
stipulate that the effects of certain events cancel the effects of other, con- 
flicting, events. However it is argued that this technique is inadequate 
when it comes to the representation of conflicting defeasible events. Con- 
sequently event preferences are suggested. These can be used to indicate 
which of two conflicting events normally succeeds; and thus, in effect, 
which of the two conflicting events normally cancels the effects of the 
other. 



1 Introduction 

This paper is concerned with conflicts which arise between simultaneous defea- 
sible events. Previous work, notably that of Gelfond, Lifschitz and Rabinov [3], 
and Lin and Shoham [5], which has been summarized by Shanahan [6, Sec. 10.4], 
has dealt with simultaneous events which are generally not defeasible but whose 
effects may be cancelled by other events. The notion of cancellation is motivated 
by the example of lifting a bowl full of soup. Thus, if only one side (the “left” side 
or the “right” side) of the bowl is lifted, then the soup is spilt. However, if both 
sides of the bowl are lifted simultaneously, then the soup is not spilt. Thus the 
complex lift-both-sides action cancels the effects of its component, elementary, 
lift-left and lift-right actions. Cancellations are realized in the Situation Calculus 
by introducing and minimizing a Cancels predicate, together with explicit can- 
cellation axioms. Thus, in the example, a lift-left action has the effect of spilling 
the soup, but if it occurs as part of a complex lift-both-sides action, then the 
lift-right component cancels this effect; and similarly for a lift-right action. 

However this approach is limited by the assumption that events are otherwise 
not defeasible; that is, that, cancellations aside, if the preconditions of an event 
hold when it occurs, then it always succeeds and its effects are guaranteed. In 
order to see the problems which arise when this assumption is lifted, suppose 
that one of the components of the complex lift-both-sides action fails; because a 
hand slips, etc. Then the failing component action should not cancel the effects 
of the other component action, and the soup should still be spilt. But how can a 
cancellation be cancelled? It may be argued that this can be done by introducing 
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a more complex lift-both-sides-with-one-slip action, which has an additional slip- 
left or slip-right component action. But to do so is to embark on an endless task. 
What happens if both hands slip; what if another agent interferes; what if the 
soup bowl is being lifted on a ship which may pitch or roll in such a way that, 
regardless of which lift events occur, the soup is (is not) spilt, etc? 

Clearly a more sophisticated approach is required for simultaneous defeasible 
events. Consider an example which involves two conflicting elementary events 
occurring simultaneously. Two agents, Stan and Ollie, attempt to move to the 
same location simultaneously but only one of them can succeed. Suppose further 
that Ollie’s success is more likely than Stan’s, say because he is bigger. Then 
we want to say that when these events conflict it is normally the case that 
Ollie succeeds and Stan fails, and consequently it is normally the case that the 
move-Ollie event effectively cancels the effects of the move-Stan event. However, 
if Ollie fails for some independent reason, if he slips say, then the move-Ollie 
event should not prevent the move-Stan event from succeeding; although, of 
course, Stan may also slip, etc. Similarly, if a complex action, such as lifting the 
soup bowl with both hands simultaneously, occurs, then we want to say that this 
normally succeeds and effectively cancels any conflicting effects of its component 
events. Consequently this paper suggests the notion of event preferences, which 
can be used to indicate which of two events should normally succeed when they 
conflict. 

Event preferences are introduced as an extension of the causal theories de- 
veloped in [1]. Consequently, in order to set the scene, a brief account of causal 
theories is given in Section 2. Event preferences are then added in Section 3, 
where a suitable pragmatics is defined and examples are given. 



2 Causal Theories 

The causal theories developed in [1] provide a unified means for representing pre- 
dictive common sense reasoning about actual events and their effects (including 
the representation of inertia, qualifications, ramifications, and non-determinism) 
and they can be used as the basis of a theory of counterfactual events, which in 
turn can be used to represent explanatory common sense reasoning [2]. 

Causal theories are expressed in a language called the Temporal Calculus, 
or simply 7T. The formal syntax and semantics of 7T are presented in [2], con- 
sequently the language will be introduced informally as required in this paper. 
TC is a three- valued, temporal language which permits first-order and limited 
higher-order quantification. Its propositional basis is provided by Kleene’s strong 
three- valued language [4], which is given an epistemic, resource-bounded, inter- 
pretation. Accordingly, a sentence can be true (established by the reasoner(s) as 
being true), false (established by the reasoner(s) as being false) or undefined (ig- 
nored by the reasoner(s) as irrelevant, or unestablishable within the reasoner(s) 
resource limitations). In keeping with these intuitions the truth conditions for 
the logical constants yield a Boolean truth value wherever possible. For example, 
the sentence -i</> is true if (p is false, is false if (p is true, and is undefined otherwise. 
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Similarly, the sentence 4> Aij) is true if (f> and ip are both true, is false if either is 
false, and is undefined otherwise. The analogues of classical disjunction, V, and 
material implication, D, can be defined in the usual way; thus (py ip is defined 
as A ^ip), and p D ip is defined as -'p V p. In order to increase the expres- 
siveness of Kleene’s language the undefined operator, ?, is added. Informally the 
sentence Ip states that the truth value of p is undefined; that is, that neither 
the truth nor the falsity of p is established. The following additional operators 
can now be defined: 

op ipy p 
•p = ipy —Ip 
\p ^ip 

Thus op states that p is not false, »p states that p is not true, \p states that p 
is defined (is not undefined), p ^ p is a, resource-bounded conditional which is 
true if p is true or p is not, and p = p states that p and p are equivalent (are 
both true, or both false, or both undefined). 

As space is limited, the treatment of causal theories will be restricted to the 
theory of primary events. These can be thought of as defeasible strips events. 
Thus they are defined by specifying their preconditions and postconditions; for 
example a Move event is defined by axioms (7) and (8) in the next section. The 
axiom of change then states that if event e occurs at time t and the preconditions 
of e are true at t and it is not true that e is qualified at t, then the postconditions 
of e are true at t + 1 : ^ 

Pre{e){t) A Occ{e){t) A •Qual{e){t) -A Post{e){t + 1) (1) 

Intuitively, e is qualified at t if there is some reason why e should not succeed at 
t. The intention is to use this axiom positively whenever possible: given Pre{e)(t) 
and Occ{e){t), lQual{e){t) should be assumed and the axiom used to conclude 
Post{e){t + 1), if doing so is consistent. Thus, on the intended interpretation of 
the axiom, events normally succeed if their preconditions are true when they 
occur. Qualifications apply only to events which would otherwise succeed: 

Qual{e){t) -A Pre{e){t) A Occ(e)(t) (2) 

Thus assumptions of the form lQual{e){t) are called change assumptions, and 
the distinction between the occurrence of an event and its success is underlined 
by the following axiom: 

Succ{e){t) = Pre{e){t) A Occ{e){t) A •Qual{e)(t) (3) 



p ^ p ‘^= »p V —lup 
d&f 

p = p = {-'•p A -'•p) V {-<op A -lOp) V (7p A ?p) 



^ As is customary, terms starting with upper case letters are used for constants and 
terms starting with lower case letters are used for variables. Customary abbreviations 
are also used. In particular universal quantifiers are omitted. Free variables should 
thus be understood as being bound by universal quantifiers with wide scope. 
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Inertia is represented by means of a common sense inertia axiom. Elementary 
facts concerning the domain are represented by first-order atomic sentences of 
the form r{ui, . . . , Un){t); where the Ui are terms denoting domain objects, and 
t denotes a time point. Thus, if the relation r{ui , . . . , u„) is true (false) at time 
t and there is no reason to doubt that its truth (falsity) persists, then we should 
conclude by default that it does so; that is, that r{ui, . . . , u„)(t+l) is true (false). 
In order to formalize this intuition, the non-temporal component r(ui , ... ,Un) 
of a first-order atom r(ui, . . . , Un){t) is called a Kleene atom, and a Kleene literal 
is either a Kleene atom or its negation. Then, for Kleene literal £ and time point 
t, Aff{£){t) states that £ is affected at t, that is, that there is reason to doubt 
that that the truth value of £ persists beyond t. The inertia axiom can now be 
stated as follows: 



£{t)A*Aff{£){t)^£{t+l) (4) 

Thus the axiom states that if the Kleene literal £ is true at time t and it is not 
true that £ is affected at t, then £ remains true at t-\-l. The intention is that the 
axiom should be used positively whenever possible: given £{t), lAfif{£){t) should 
be assumed and the axiom used to conclude £{t-\-\) if doing so is consistent. In 
keeping with this interpretation, assumptions of the form lAfif{£){t) are called 
inertia assumptions. 

Definition 1. The theory of primary events, 0p, consists of the axioms {{!), 
. . . , { 4 )}- Theories which contain Op are called causal theories. 

The intended interpretation of causal theories, and particularly of the de- 
feasible change and inertia axioms occurring in them, is given by their formal 
pragmatics; which specifies a selected subset of the models of each causal theory, 
and which is appropriate to the extent that the selected models of the theory 
coincide with the intended models of the theory. The pragmatics is based on the 
principle of chronological minimization, suggested by Shoham [7], but refines 
this to what might be called prioritized chronological minimization', the mini- 
mization is still chronological, however, at each time point, facts and events are 
minimized before qualifications which in turn are minimized before affectations. 
As the pragmatics is intended for causal theories and these contain the theory 
of primary events. Op, the models it selects will be referred to as P-preferred 
models. 

For TC-model M and time point t, let M-ji/t {Mo/t, Mq/I, M^/t) be the 
set of first-order {Occ, Qual, Aff) atomic sentences which are defined in M up 
to t: 

M-ji/t = {r{ui, . . ., Unfit') :t'<t and M ^ !r(ui, . . . ,u„)(t')}, 

Mo/t = {Occ(e)(t') : t' <t and M ^ \Occ{e)(t')}, 

MQ/t = {Qual{e)(t') : t' <t and M \= \Qual{e)(t')}, 

Ma/I = {Affmt') : t' < t and M h '-Aff {£){t')}. 

For convenience, unions formed from these sets will be denoted by the jux- 
taposition of their subscripts; thus, for example, MpioQ/t denotes the set 
Mn/tU Mo/tU Mq/I. 
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Definition 2. Let M and M' he TC-models which differ only on the interpre- 
tation of first-order relations and the Occ, Aff and Qual predicates. Then M is 
P-preferred to M' (written M M' ) if and only if there exists a time point t 

such that: 

— M-po/t C M!j^(j/t, 

— or M-po/t = M!j^(^/t and MQ/t C Mq/I, 

— or M-pQQ/t = and Mj^/t C M(^/t. 

A model M is a P-preferred model of a sentence </> if and only if M \= <f> and 
there is no other model M' such that M' |= (f> and M' M. M is a P-preferred 
model of a set of sentences 0 if and only if M \= 0 and there is no other model 
M' such that M' ^ 0 and M' -<p M. A causal theory 0 P-predicts a sentence 
4>, written 0 |^p 4>, if and only if (p is true in all P-preferred models of 0. 

The pragmatics seems natural. At each time point, the facts and events which 
follow from the pragmatic interpretation of the theory at earlier time points 
are fixed before any assumptions are made regarding the future. This has the 
effect that speculating about the future cannot alter the present (or the past). 
Then qualifications are minimized (change assumptions are maximized) before 
minimizing affectations (maximizing inertia assumptions). This has the effect 
that if (instances of) the change and inertia axioms conflict, then preference is 
given to the change axiom, and consequently, where there is a conflict, change 
is preferred to inertia. 

3 Simultaneous Events 

We are now in a position to consider simultaneous defeasible events. Recall the 
introductory example where two agents, Stan and Ollie, attempt to move to the 
same location simultaneously. Other things being equal, the two events conflict, 
and the result is chaos; that is, it is impossible to predict the outcome. 

Example 1. (Two stooges) At time 1: Ollie is at location PI, Stan is at location 
P3 and both attempt to move to location L2: 



At{0,Ll){l) AAt{S,L3){l) (5) 

Occ(Moue(0, PI, P2))(l) A Occ{Move{S,L3,L2)){l) (6) 

Pre{Move{x,l,l')){t) = At{x,l)(t) (7) 

Post{Move{x, I, l')){t) = At{x, l'){t) A -•At{x, l){t) (8) 

At{x,l){f) A X y ^ ~^At{y, 1) {t) (9) 

PAA[S',0,P1,P2,P3] (10) 



Here (9) is a domain axiom which states that two different objects cannot be at 
the same location at the same time, and (10) states that the names S, O, PI, 

L2 and P3 are unique; formally, UNA[ui, . . . — f\Ui uj for 1 < z < j < n. 
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Let = Op U {(5), . . . , (10)}. Then there are P-preferred models of Oi in 
which Ollie succeeds in getting to L2 at time 2 and Stan remains at L3. There 
are also P-preferred models of 0\ in which Stan succeeds in getting to L2 at 
time 2, and Ollie remains at PI. Consequently neither Oi [«p At(0, L2)(2) nor 
6»i At{S,L2){2). 

Proof. The two move events which occur at time 1 conflict. If Occ{M ove{0 , 
P1,P2))(1) succeeds, then it follows from the change axiom and axiom (8) that 
At{0, L2){2) is true. Similarly if Occ{Move{S, L3, L2)){1) succeeds, then it fol- 
lows from the change axiom and axiom (8) that At{S, L2){2) is true. However, 
in view of axioms (9) and (10), if At{0, P2)(2) is true, then At{S, P2)(2) is false, 
and vice-versa. Thus at least one of the two events fails, and it follows from the 
contrapositive of the change axiom at least one of Qual{Move{0,Ll,L2)){l) 
and Qual{Move{S,L3,L2)){l) is true in any model of Oi- 

Clearly also there are models Oi in which one of the change assumptions 
? Qual{Move{0, LI, L2)){1) and lQual{Move{S,L3,L2)){l) is true. Indeed, as 
qualifications are minimized before affectations (as change is preferred to inertia) 
at each time point in P-preferred models, one of these change assumptions is 
true in any P-preferred model of Oi . 

So, let M be a P-preferred model of Oi in which ? Qual{Move{0, PI, P2))(l) 
is true. Then it follows, by the success axiom and (5)-(7), that Succ{Move{0, 
P1,P2))(2) is true in M. And it follows by (5)-(8) and the change axiom that 
At{0, P2)(2) is true in M . Consequently it follows, by the argument above, that 
Qual{Move{S, P3, P2))(l) is true in M. As M is a P-preferred model of 0i, the 
inertia assumption 1 AfJ{At{S,L3)){l) is true in M, so it follows by the inertia 
axiom that At{Stan, L3){2) is true in M. 

Analogous reasoning shows that there are also P-preferred models of Oi in 
which At{S, L2)(2) and At{0, Ll){2) are true. □ 

Example 1 can be elaborated by stating that when conflicting move events 
occur Ollie succeeds, say because he is bigger: 

Example 2. Let 6>2 = 0\ U {(11)} where: 

Occ{Move{0,l,l')){f) A Occ{Move{S,l” ,l')){f) (11) 

•Succ{Move{S, I" , l')){t) 

Then, as Stan’s attempt to move fails in all models of O 2 , Ollie’s attempt to 
move succeeds in all P-preferred models of O 2 ' O 2 Hp At{0,L2){2). □ 

In this example axiom (11) has the desired effect. If Stan and Ollie both 
attempt to move to the same location, then Stan fails and Ollie succeeds, and 
so the conflict is resolved in Ollie’s favour; thus, in the terminology used in 
the introduction, the axiom cancels Stan’s movement. However if there is some 
independent reason why Ollie fails, say because he slips, then the axiom has 
the unfortunate consequence that Stan also fails; as the cancellation of Stan’s 
movement is uncancellable. Clearly what is needed is some means of saying that 
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Ollie’s success is preferred to Stan’s success, without it being the case that Stan’s 
failure is prohibited should Ollie fail for some independent reason. 

In order to be able to do so the relation Pref is introduced. Thus the event 
preference Pref{e, e')(t) should be thought of as stating that if events e and e' 
occur at time t, then the success of e is preferred to the success of e' at t; that 
if e and e' conflict at t, then the success of e is more likely than the success 
of e' . Thus if there is no independent reason why e should fail (if there is no 
reason other than the occurrence of e'), then e should succeed and e' should fail. 
However, if e does fail for some independent reason, then the occurrence of e 
should not, of itself, prevent e' from succeeding. The conflict resolution axiom 
(11) can thus be replaced by the following one: 

Occ{Move{0,l,l')){t) A Occ{Move{S,l" ,l')){t) 

Pref{Move{0, 1, l'),Move{S, I", l')){t) 

Event preferences are required to be asymmetric: 

Pref{e, e'){t) ~^Pref{e' , e){t) (12) 

However, in the interests of generality, no further conditions are imposed; al- 
though conditions such as transitivity can, of course, be added when they are 
appropriate. 

Definition 3. The theory of event preferences, 0q, consists of the axiom {12), 
and the theory of primary events with event preferences, Opq, is Op \J Oq. 

When causal theories contain Oq and event preferences their pragmatics 
needs to be refined further. In particular, if models M and M' disagree on 
qualifications at time t, then M may be preferred to M' on the basis of event 
preferences. This may be because fewer event preferences are defined in M than 
in M' at t, or because M satisfies the event preferences which are applicable at t 
better than M' does. This better-satisfies relation on applicable event preferences 
is defined next. 

For any model M and time point t, let Mp{f) be the set of event preferences 
which are applicable (in M at t): 

Mp{f) = {Pref{e,e'){f) : M |= Pref{e,e'){f) A Pre{e){t) A Occ{e){f) 

A Pre{e'){t) A Occ{e'){t)}, 

and call the events referred to in these event preferences the preferential events 
(in M at t). Then the degree of each of the applicable event preferences can be 
defined in terms of the degree of the leftmost preferential event referred to in it. 
The ordering on preferential events is indicated as follows: 

e e' iff Pref{e, e'){t) G Mp{t), 
e ~<q e' iff e e' /\ -^3e”{e <q e" A e” -<q e'). 
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Let: 



El = {e : 3e'e e' A ~'3e' e' <q e} and, for n > 1, let: 

n—1 

Eji = {e : 3e'{e' G E^-i /\ e' e A e ^ ^i)}- 

i=l 

Then the degree of preferential event e G Ei is i and, for any e' such that e -<q e', 
the degree of the corresponding event preference Pref(e,e'){t) is also i. Now, an 
applicable event preference Pref{e,e'){t) is satisfied if •Qual{e){f) is true and is 
violated if Qual{e){f) is true. Consequently, if: 

= {Qual{e){t) G Mq/I : 

Pref{e,e'){t) G Mp{t) V Pref{e' ,e)(t) G Mp{t)} 

is the set of preferential events which are qualified in M at t, then the set: 

n 

Mp{t)/n = [J EiC\{e : Qual{e){t) e MQ{t)} 

consists of those preferential events of degree n > 1 whose corresponding event 
preferences of degree n > 1 are violated in M at t. So, suppose that M and 
M' are models which agree on which event preferences are applicable at some 
time t] suppose, that is, that Mp{t) = M'p{f). Then M satisfies the applicable 
event preferences at t better than M' does if and only if, for some n, more of 
the applicable event preferences of degree m < n are satisfied in M than in M': 

MQ{t) <Q MQ{t) iff there is some n such that Mp{t)/n C M'p{f)/n. 

Moreover, M satisfies the applicable event preferences at t at least as well as 
M' does if and only if either M satisfies these event preferences better than M' 
does, or M and M' agree on the qualification of preferential events at t: 

Mgit) <Q M'Q{t) iSMgit) M^{t) or Mg(t) = M'Q{t). 

Now, let: 

Mp/t = {Pref{e, e){t') : t' <t and M ^ \Pref{e, e)(t)} 

be the set of atomic event preference sentences which are defined in M up to time 
t. Then the definition of P-preferred model is extended to that of PQ-preferred 
model as follows: 

Definition 4. Let M and M' be TC -models which differ at most on the inter- 
pretation of first-order relations and the Occ, Aff , P ref and Qual relations. Then 
M is PQ-preferred to M' (written M ^pq M' ) if there exists a time point t 
such that: 



M-jio/t C Mlp^/t, 
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— or MTzo/i = and M-p/t C M!p/t, 

— or MpQp/t = M^^-p/t and either 

• Mq/I - MQ{t) C Myt - Aiyt) and Mq{ 1) M^t), 

• or Mq/I - Mq{t) C Mq/I - MUt) and MQ{t) Mqit), 

— or Mpova/t = Mp^pq/t and Myjt C M^/t. 

Thus model M is preferred to model M' on the basis of event preferences at 
time t if and only if either fewer event preferences are defined in M up to t, or 
M and M' agree on defined event preferences up to t (and thus on applicable 
event preferences at t: Mp{t) = M'p{t)) and either: 

— fewer events are qualified in M up to f if the preferential events at t are set 
aside, and M satisfies the applicable event preferences at t at least as well 
as M' does, 

— no more events are qualified in M up to t if the preferential events at t are 
set aside, and M satisfies the applicable event preferences at t better than 
M' does. 

It is easy to check that -<pq is a partial order, and that PQ-preference 
reduces to P-preference in the absence of event preferences. 

Definition 5. A model M is a PQ-preferred model of a sentence (j) if M \= (f> 
and there is no other model M' such that M' |= (j) and M' <pq M. M is a 
PQ-preferred model of a set of sentences 0 if M \= 0 and there is no other 
model M' such that M' |= 0 and M' <pq M. If 0 is a theory which contains 
0pQ, then 0 PQ-predicts a sentence 4>, written 0 Hpg 4>, if and only if <j) is 
true in all PQ-preferred models of 0. 

The following three examples illustrate various aspects of the formal defi- 
nitions, particularly the better-satisfies relation on event preferences. The first 
example illustrates the fact that event preferences are interpreted transitively 
wherever possible. 

Example 3. (Three stooges) Suppose that initially Ollie is at location PI, Stan is 
at location P2, and Charlie is at location P3. Suppose also that they simultane- 
ously attempt to move to location P4 and that only one of them can succeed. If 
size is taken as the deciding factor, then as Ollie is bigger than Stan and Stan is 
bigger than Charlie, Ollie’s success is more likely than Stan’s and Stan’s success 
is more likely than Charlie’s. 

This scenario can be represented by the causal theory ©3 = 0pg U {(7), . . . , 
(9), (13),..., (17)}, where: 

At{0, Pl)(l) A At{S, P2)(l) A At{C, P3)(l) (13) 

Occ{Move{0, LI, P4))(l) A Occ{Move{S, L2, P4))(l) 

A Occ{Move{C, P3, P4))(l) (14) 

Pref{Move{0, PI, LA),Move{S, P2, P4))(l) (15) 

Pref{Move{S, P2, LA),Move{C, L3, P4))(l) (16) 

PiV^[S',0,C',Pl,P2,P3,P4] (17) 
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Then, as desired, 6*3 PQ-predicts that Ollie succeeds and that Stan and Charlie 
both fail; 6)3 At{0, L4){2) A At{S, L2){2) A At{C, L3){2). 

Proof. Suppose that M, M' and M” are models of 6*3 which agree on the 
minimal set of facts, events and event preferences up to time 1. Suppose, that 
is, that: 

Mnov/l = {At{0,Ll){l),At{S,L2){l),At{C, L3)(l)} 

U {Occ(Moue(0, LI, L4))(l), Occ{Move{S, L2, LA)){1), 
Occ{Move{C, L3, L4))(l)} 

U {Pref{M ove{0 , LI, LA), Move{S, L2, L4))(l), 

Pref{Move{S, L2, LA),Move{C, L3, L4))(l)}, 

and that Mq^o-p/l = Suppose also that, preferential 

events aside, the three models agree on the minimal set of qualifications up to 
time 1; thus Mq/1 — Mq{1) = M'q/1 — M'q{1) = Mq/1 — Mq{1) = 0. Suppose 
finally that the three models disagree on the qualification of preferential events 
at time 1; thus: 

Mq(1) = {Qual{Move{S,L2,LA)){l), Qual{Move{C, L2i, LA)){A)'\, 

Mq(1) = {Qual{Move{0,Ll,LA)){A),Qual{Move{C,L‘i,LA)){l)'\, 

Mq{1) = {Qual{Move{0,Ll,LA)){l),Qual{Move{S,L2,LA)){A)'\. 

This is in keeping with the assumption of minimality; as axioms (8), (9) and 
(17) require that two of the three preferential events are qualified in any model 
of 03. Now, as the three models agree on M-jiov/^ they also agree on which 
event preferences are applicable at time 1. Thus: 

Mp(l) = {Pre/(Moue(0,Ll,L4),Moue(S',L2,L4))(l), 

Pref{Move{S, L2, LA),Move{C, L3, L4))(l)}, 

and Mp(l) = Mp(l) = Mp(l). Consequently the three models can be ordered 
according to the better-satisfies relation. The degree of each of the preferential 
events at time 1 is indicated as follows: 

Pi = {Mowe(0,Ll,L4)}, P2 = {Mowe(S', L2, L4)}, P3 = {Move(C', L3, L4)}. 

So Mp(l)/1 = 0 and Mp(l)/1 = Mp(l)/1 = {Mowe(0, LI, L4)}. Conse- 
quently Mq{1) Mq{1) and Mg(l) Mq( 1). (Similarly, as Mq(1)/2 = 

{Move{0, LI, LA)} and Mg(l)/2 = {Move{0, LI, LA), Move{S, L2, LA)}, 

Mg(l) ^g Mg(l).) So M <pQ M' and M <pq M". Consequently M' is 
not a PQ“Preferred model of 0^ and neither is M” . Moreover, assuming that 
M_a/\ is minimal, assuming that is that M_a/ 1 = {Aff{At{0, L1)){1)}, M is 
a PQ-preferred model of ©3, and the desired conclusions follow in M by the 
change and inertia axioms and the definition of the move event. □ 

The following extension of Example 3 illustrates the success of a less preferred 
event when the more preferred event fails for some independent reason. 
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Example 4- Suppose that, in the scenario of Example 3, Ollie slips when at- 
tempting to move to location LA. As a result his attempt to move should fail 
and he should remain at LI. However, as Ollie fails, Stan should now succeed in 
his attempt to move to L4. 

Let ©4 = 00 U {(18), (19)} where: 

Slippery (LI) (1) (18) 

Occ{Move{x, I, l')){t) A S Upper y{l){t) — >■ •Succ{Move{x, I, l')){t) (19) 

Then, as desired, ©4 At{0, LI) (2) A At{S, L4)(2) A At{C, L3)(2). 

Proof. Let M he & model of ©4 which is minimal on facts, events and event 
preferences up to time 1. Then Mtiq-p/I consists of its namesake from the pre- 
vious proof and the additional fact Slippery{Ll){l) , and Mp(l) agrees with its 
namesake from the previous proof. Suppose finally that the only events which are 
qualified in M are those in the set Mg(l) of the previous proof. Then clearly M 
is ^pQ-preferred to any model M” of 0^ which agrees with M on Mtiov/^ and 
Mp(l) but which disagrees with M on qualifications; in particular, if Mg(l) is as 
in the previous proof, then M M” and 0 C MqII — Mg(l), so M <pq M'. 
□ 



The final example illustrates what happens when the event preferences cannot 
be interpreted transitively in a coherent way. 

Example 5. Suppose that, in the scenario of Example 3, the event preferences 
are extended to take account of speed as well as size. Thus a preference is added 
to the effect that if Ollie and Charlie both attempt to move to the same location 
simultaneously, then Charlie, being quicker, should succeed. Clearly, if any two 
of the move events occur, then the preferences can be interpreted coherently; as 
in previous examples. However, if all three events occur simultaneously, then it 
is not possible to determine their outcome on the basis of the event preferences. 

Let ©5 = 00 U |Pre/(Moue(0, L3, L4), Moue(0, LI, L4))(l)|. Then there 
are PQ-preferred models of 0^ in which At{0, L4)(2) is true, but there are PQ- 
preferred models of 05 in which At{S,LA){2) is true, and PQ-preferred models 
of 05 in which At(0, L4)(2) is true. 

Proof. The models of 05 which are minimal in facts, events, event preferences 
and qualifications at time 1 all have the following event preferences applicable 
in them: 



{Pref{Move{0, LI, L4), Move{S, L2, L4))(l), 

Pref{Move{S, L2, LA),Move{C, L3, L4))(l), 

Pref{Move{C, L3, LA), Move{0, LI, L4))(l)|. 

But, as these preferences are cyclical, the preferential events they refer to do not 
have a defined degree: for any t > 1, = 0. Consequently, these models cannot 

be ordered by the -<q relation. Thus there are PQ-preferred models in which 
each of the preferential events succeeds and the other two fail. □ 
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4 Concluding Remarks 

The treatment of event preferences has been at the level of elementary events; but 
the extension to complex events is straightforward. Indeed, a defeasible version 
of the introductory soup-bowl example can be represented without further ado 
by means of definitions such as the following: 

Pre{LiftL){t) = OnTableit) A -^HoldingLit) 

Post{LiftL){t) = P[ aiding L{t) A Empty (t) 

Pre{Lift){t) = Pre{LiftL)(t) A Occ{LiftL){t) 

A Pre{LiftR){t) A Occ{LiftR)(t) 

Post{Lift){t) = HoldingL{t) A HoldingR{t) A ^Empty{t) 

Pref{Lift, LiftL){t) A Prej{Lift,LiftR){t) 

The treatment has also been based on the theory of primary events. However, 
the inclusion of secondary events is straightforward; as the pragmatic refinements 
required for secondary events and event preferences are readily combined. 

As event preferences have a temporal index it is also possible to represent 
changing event preferences over time; for example, a fast horse may be favourite 
to win next week’s big race if the racecourse remains dry, while a sturdy horse 
may be favourite if the course becomes wet. In order to represent the default 
persistence of event preferences an AffP-predicate can be added (where, intu- 
itively, AffP{Pref{e, e')){t) states that the event preference Pref{e, e') is affected 
at time t) together with an appropriate inertia axiom. Definition 4 would also 
need to be extended with the effect that AffP-atoms are minimized with lowest 
priority at each time point. 
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Abstract. As systems become increasingly complex, event abstraction 
becomes an important issue in order to represent interactions and reason 
at the right level of abstraction. Abstract events are collections of more 
elementary events, that provide a view of the system execution at an 
appropriate level of granularity. Understanding how two abstract events 
relate to each other is a fundamental problem for knowledge represen- 
tation and reasoning in a complex system. In this paper, we study how 
two abstract events in a distributed system are related to each other 
in terms of the more elementary causality relation. Specifically, we ana- 
lyze the ways in which two abstract events can be related to each other 
orthogonally, that is, identify all the possible mutually independent re- 
lations by which two such events could be related to each other. Such 
an analysis is important because all possible relationships between two 
abstract events that can exist in the face of uncertain knowledge can be 
expressed in terms of the irreducible orthogonal relationships. 



1 Introduction 

As systems become increasingly complex, event abstraction becomes an impor- 
tant issue in order to represent interactions and reason at the right level of 
abstraction. Abstract events are collections of more elementary events, that pro- 
vide a view of the system execution at an appropriate level of granularity. Under- 
standing how two abstract events relate to each other is a fundamental problem 
for knowledge representation and reasoning in such a complex distributed sys- 
tem. This problem is of interest across philosophy, physics, artificial intelligence, 
computer science, and psychology [2]. 

Hamblin [10] and Allen [2] have shown that two linear time durations or 
intervals that are colocated can be related in one of 13 possible ways. These 13 
relations form an orthogonal set of relations, i.e., the intervals must be related 
by one and only one of these relations, implying that the conjunction of any two 
relations is the empty relation. Orthogonal relations are important because they 
identify all possible mutually exclusive relations that can possibly hold between 
any given pair of intervals and because all possible relationships between two 
intervals that can exist in the face of uncertain knowledge can be expressed 
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in terms of the irreducible orthogonal relationships. The set of 13 orthogonal 
relations between a pair of colocated linear intervals has been used extensively 
in the literature on artificial intelligence. For example, [8] developed a theory 
of temporal reasoning using semi-intervals which arise when there is uncertain 
and imprecise knowledge of intervals, using the 13 orthogonal relations of Allen. 
Examples of other uses of the 13 orthogonal relations between colocated linear 
intervals include [3,4,5,9,14,15,16]. 

The literature surveyed above considered the interactions and relative place- 
ment of time intervals, each of which can be viewed as a linearly ordered set 
of time instants. An additional assumption was that time was continuous, and 
hence the time intervals satisfy the density axiom (refer van Benthem [6] for the 
formal definitions and a detailed discussion of continuity and density) . 

Our objective is to study how two abstract events in a distributed system are 
related to each other in terms of the causality relation. The relativistic space-time 
model is an appropriate model of a distributed system execution for this study. 
We analyze the ways in which two abstract events can be related to each other 
orthogonally, that is, identify all the possible mutually independent relations by 
which two such events could be related to each other. The results of this paper 
differ from the work surveyed above in the following aspects. Each of the abstract 
events we consider is a partial order of more elementary events, unlike the time 
intervals which linearly order the component time instants. Additionally, the 
system model explicitly models individual events/actions/statement executions 
that occur at different processes in the execution of a complex distributed system, 
and hence models discrete events explicitly. 

The work is motivated by the fact that in a distributed system, abstract 
events, wherein at least some of the component elementary events of the ab- 
stract event occur concurrently, are of great interest in simplifying the reason- 
ing about distributed executions [12,13]. Henceforth, we also term such abstract 
events as poset (partially ordered set) events. Such poset events accurately model 
collaborative activity among multiple CPU subsystems in a distributed system, 
for various applications like navigation, planning, robotics, mobile computing, 
coordination among multiple participants in a virtual reality environment, and 
agent-based distributed cooperating programs. As a specific example, multiple 
autonomous robots need to cooperate to jointly solve a task such as to focus laser 
beams on a target so that the beams arrive at the target at a fixed moment. As 
another example, multiple roving mobile agents that can communicate only by 
message passing need to synchronize their actions in an adversarial environment. 
Causality between poset events has been studied in [12] wherein a spectrum of 
fine-grained causality relations between poset events was presented, along with 
an axiom system to reason with such relations. These relations provide a pre- 
cise handle to express and represent a naturally occurring or enforce a desired 
fine-grained level of causality or synchronization among the cooperating agents. 
However, these relations are not orthogonal relations. In this paper, we present 
a methodology for deriving orthogonal relations between poset events. Section 2 
gives the system model. Section 3 gives the main results. Section 4 concludes. 
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2 System Model and Preliminaries 

A poset event structure model {E, ^), where ^ is an irreflexive partial ordering 
representing the causality relation on the possibly infinite event set E, is used as 
the space-time model for a system execution, as in [12]. {E, can follow either 
the discreteness or the density axioms [6]. E is partitioned into local executions 
at coordinates in the space dimensions. Each Ei is a linearly ordered set of 
events in partition i and corresponds to the execution of events by a distinct 
process i. An event e in partition i is denoted e^. The causality relation on E is 
the transitive closure of the local ordering relation on each Ei and the ordering 
imposed by message send events and message receive events. In [11,12], poset 
events are defined as follows. Let £ denote the power set of E. Let A (yf 0) 
C {£ — %). ^ is the set of all those sets that represent a higher level grouping of 
the events of E of interest to an application. Each element A of ^ is a subset of 
E, and is termed an abstract event or a poset event. 

Table 1. The six basic relations, see [11,12]. 



Relation r 


Expression for r(X, Y) 


R1 


V® G XVi/ aY,x < y 
(= Vy G YMx £ X,x < y) 


R2 


Va; G X3y £Y,x < y 


R2' 


3y G Y\fx £ X,x < y 


R3 


3x £ X\/y £Y,x < y 


R3' 


Vy G Y3x £ X,x < y 


R4 


3x £ X3y £Y,x < y 
(= £ X,x <y) 



The causality relations between a pair of poset events were formulated in [12] 
using the notion of proxies. Each poset event X was defined to have two proxies 
- the set of its least elements Lx, and the set of its greatest elements Ux- These 
proxies were the equivalents of the beginning and end instants of the linearly 
ordered interval. Two alternate definitions of proxies were given: 

— Definition 4 [12], viz.. Lx = {ci € X \ Ve' G X, ^ e'} and Ux = {cj G 
X I Ve' € X,Ci> e'J, and 

— Definition 5 [12], viz.. Lx = {e G X | Ve' G X, e ^ e'} and Ux = {e G 
X I Ve' G X, e 7 ^ e'} 

Figure 1 depicts the proxies of X and shows the difference between the two 
definitions. In the figure, the time axis goes from left to right, and the lines 
with arrows denote the messages that impose causality across different processes 
(points in space). Depending on the problem domain, an application chooses and 
consistently uses one definition of proxy. For example, for events in a distributed 
sensor/robot system, where the various sensors/robots cooperate to perform 
loosely synchronized actions, the former definition is more suitable to represent 
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the start and end of interactions. When different mobile agents invoke services 
offered by other agents/servers in a nested Remote Procedure Call (RPC) form, 
the latter definition is more suitable to represent the start and end of interactions. 



X 




message between processes 9 elementary event 



Fig. 1. Poset event X and its proxies Lx and Ux- The proxies defined by Definition 4 
are shown by the closely spaced dashed lines. The proxies defined by Definition 5 are 
shown by dotted lines. 

The causality relations in [12] were defined using the following two aspects of 
specifying the relations, based on the concept of proxies, (i) As there is a choice 
of two proxies of X and a choice of two proxies of Y, there are four combinations 
between the proxies, (ii) The six causality relations in Table 1 can be specified 
for each combination, thus yielding 24 relations between X and Y. The set of 
these causality relations is denoted TZ. The following nomenclature was adopted 
to name the relations in TZ. Relation R1^{X,Y) was such that R1 was a value 
from {i?l, R2, R3, i?4} and indicated the choice of proxies of X and Y, whereas 
^ indicated how the chosen proxies were related to each other, and took a value 
from { a, b, b' , c, c', d }, where Rl, R2, R2', R3, R3', R4 were renamed a, b, b', c, 
c' , d, respectively, to avoid confusion with the previous usage of the relations Rl 
- R4. The set of relations TZ between poset events was complete using first-order 
predicate logic and only the ^ relation between elementary events. The relation 
algebra given in [12] can be viewed as a power algebra [7]. 

In this paper, the label TZ is used to denote the set of the above relations 
when the discussion is common to the relations defined using either definition of 
proxies, viz., Definition 4 or 5 [12]. If the distinction matters, the notations TZ^' 
and TZ~^ are used to denote the sets of relations that result when Definition 4 and 
5 of proxies, respectively, are used. Intuitively, TZ'^' indicates the set of relations 
resulting when the proxies are defined using the ^ relation on each Ei, and TZ~^ 
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indicates the set of relations resulting when the proxies are defined using the ^ 
relation on E. Each of and 7?.^* forms a hierarchy of dependent relations as 
shown in Figure 2. The relative hierarchy among relations in TZ'^ and relations 
in is given in [12]. 

A set of axioms to reason with the relations in TZ~^ was given in [12]. The 
set of axioms was complete in the sense that (i) given any R{X, Y), the axioms 
gave all enumerations of valid relations r{X,Y) and r'{Y,X), for r,r',R G TZ^, 
and (ii) given ri{X,Y) A r 2 {Y,Z), the axioms gave all relations r{X,Z) (and 
from (i), all r'(Z, X)), for r, r',n,r 2 G TZ^ ■ Hence, the axioms could be used to 
derive all possible implications from any given predicates on relations in TZ'^ . 




Fig. 2. Hierarchy of causality relations, ordered by “is a subrelation of” [12]. An edge 
from rl to r2 indicates that rl is a subrelation of r2. 



In the next section, we give a methodology to enumerate the set of orthogonal 
relations for TZ. The results of implementing this methodology for TZ'^ using the 
axioms of [12] are then given. In this paper, we also modify the axiom system to 
make it applicable to RXg We then apply the above methodology to enumerate 
the set of orthogonal relations for TZ'^' and give the results. 

3 Orthogonal Relations 

We now propose a method to derive and enumerate the orthogonal relations 
between any pair of poset events, using the set of dependent relations TZ. We 
also present the numerical results of enumerating the orthogonal relations for 
TZ'^ and TZ^' based on the appropriate axiom system. Specifically, for TZ"', we 
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use axioms XP1-XP14 given in [12]. For we use axioms XP1-XP6 and eight 
new axioms XP7^‘-XP14^F The results of the two enumerations were obtained 
by implementing the methodology in XSB Prolog. 

The algorithm proposed here has the following two steps to create a (complete 
and mutually independent) set of orthogonal relations from the set of dependent 
relations TZ. 

1. Identify all possible combinations of relations r{X,Y) G TZ that can hold 
simultaneously for a given X and Y . 

2. For each of the identified combinations of relations r{X, P), identify all com- 
binations of r{Y,X) that can simultaneously hold for the same X and Y . 



3.1 Step 1: All Possible Relations r(A, Y) 

As a first step, we identify all the combinations of relations r{X,Y), for r G 
TZ, that hold between poset events X and Y. Note that by construction, {TZ, C), 
where C is the relation “is a subrelation of”, is a lattice as illustrated in Figure 2. 
For a given pair of posets X and Y, it may be the case that a combination of 
the relations in TZ may hold. Specifically, if R{X, Y) holds, then Vi?' | i? R i?', 
R'{X,Y) holds. Thus, if R{X,Y) holds, then for each i?' in the upward-closed 
subset^ of TZ, R'{X,Y) holds. In the partial order (7?., C), all upward-closed 
subsets of TZ correspond exactly to the combinations of relations in TZ that can 
hold concurrently for a given pair of poset events. It follows from the result on 
page 400 [1] that there is a 1-1 correspondence between the set of all upward- 
closed subsets of a partial order and the set of antichains^ in the partial order. 
Therefore, an enumeration of the antichains in {TZ, C) gives an enumeration of 
the upward-closed subsets of {TZ, C), which corresponds to all the combinations 
of the relations in TZ that can hold for a pair of poset events. Let TZAC be the 
set of all such antichains. A member of TZAC, denoted rac{X, P), is an antichain 
of TZ and can be expressed as the conjunction of the members of the antichain, 
each of which is a member of TZ, i.e., rac{X, P) can be viewed as Areroc(jf y ) 
r{X, Y). The number of antichains in TZAC was computed by the implementation 
of axioms XP1-XP6 (given below), to be as follows. There are 1, 24, 147, 350, 
341, 168, 44, 2, and 0 antichains of size 0 through 8, respectively, giving a total 
of 1077 antichains. The antichain of size 0 denotes the empty-set upward-closed 
subset of TZ, equivalent to RM{X, Y), where RM{X, Y) denotes that RAd{X, Y) 
is false. Observe from Figure 2 that the size of the largest antichain is 7. 

The axioms XPl - XP6 from [12] are reproduced here. The relation ||(ri,r 2 ) 
stands for g {ri,r 2 ) A % (r’ 2 ,?’i). Vi denotes the set {1,2, 3, 4} and P 2 denotes 
the set {a, b, b', c, c' , d}. 

XPl. i?l? C R21 i?4?, where ? is instantiated from P 2 

XP2. i?l? IZ i?3? Z i?4?, where ? is instantiated from P 2 
XP3. i?2?||i?3#, where ? and # are separately instantiated from P 2 

^ A set K C 77 is upward-closed iff for every r,r' gTZ, {r GTi f\ r r') => r' G 5R. 
^ A set 5R is an anti-chain iff for every r and r' in 5R, r g r' A r' g r. 
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XP4. Rla Rib' \Z Rib 'Q Rid, where ? is instantiated from Vi 
XP5. Rla Q RlcQ Rlc' Q Rid, where ? is instantiated from Vi 
XP6. i??5||i??c', i??5'||i??c', i??6||i??c, i??5'||i??c, where ? is instantiated from 
Vi 



3.2 Step 2: Relations r{Y,X), Given That Certain r{X,Y) Hold 

The computed combinations of relations in TZ, viz., antichains in (TZ,Q), are 
useful to determine all the orthogonal relations that can exist between any two 
poset events. For each of the |7?..4C| antichains that hold between X and Y, 
there are potentially |7?..4C| antichains that hold between Y and X, thus leading 
to a potential \TZAC\'^ orthogonal relations between X and Y. Several of these 
relations will be illegal because they contradict the relations r{X,Y). The ob- 
jective is to determine exactly all the orthogonal relations that are admissible 
by the axiom system. For each racl(X, Y), where racl G TZAC, determine which 
rac2(Y, X) can hold, where rac2 G TZAC, using the axiom system which allows 
the derivation of all r'{Y,X) from any r{X,Y), where r, r' G TZ. Then each 
conjunction of an antichain racl{X,Y) and a compatible antichain rac2(Y, X) 
is orthogonal from every other such conjunction; denote this set of conjunctions 
as TZO, which then represents all the possible orthogonal relations between two 
posets, based on the A relation among elementary events. 

Let us denote the sets of orthogonal relations obtained for relations in TZ'^ 
and TZ'^' by TZO^ and TZO^' , respectively. 



Table 2. Number of orthogonal relations in TZO^ , classified based on size of antichains. 



Size/Number 
of rac{X, Y) 
antichains 


Number of antichains rac{Y, X) of 


size s 


= 0 ... 7 




s = 0 


s = 1 


s = 2 


s = 3 


s = 4 


s = 5 


s = 6 


s = 7 


ELo 


0/1 


1 


24 


147 


350 


341 


168 


44 


2 


1077 


1/24 


24 


261 


898 


1285 


822 


264 


34 


1 


3589 


2 / 147 


147 


898 


1911 


1683 


642 


130 


4 


0 


5415 


3 / 350 


350 


1285 


1683 


937 


180 


8 


0 


0 


4443 


4 / 341 


341 


822 


642 


180 


18 


0 


0 


0 


2003 


5 / 168 


168 


264 


130 


8 


0 


0 


0 


0 


570 


6/44 


44 


34 


4 


0 


0 


0 


0 


0 


82 


7/2 


2 


1 


0 


0 


0 


0 


0 


0 


3 



Relations TZO~' . Axioms XP7-XP14 along with XP1-XP6 were used to deter- 
mine all the orthogonal relations TZO^, counted in Table 2. Axioms XP7-XP14 
are reproduced below with labels XP7^ - XP14^, respectively. 

XP7^i?la(X,T) V Rlb{X,Y) \J Rlb'{X,Y) \J Rlc{X,Y) \J Rlc'{X,Y) 
R4d{Y,X). 

XP8^. Rld{X,Y) Rib{Y,X) /\ R4c'{Y,X). 

XP9^R2a{X,Y) V R2b{X,Y) V R2b'{X,Y) \J R2c{X,Y) V R2c'{X,Y) 
R2d{Y,X). 
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XPIO^. R2d{X,Y) R2b{Y,X) /\ R2d{Y,X). 

XPll^. R5a{X,Y) V R5b{X,Y) V R5b'{X,Y) \J R5c{X,Y) \J R5c'{X,Y) 
RM{Y,X). 

XP12^. RM{X,Y) R5b{Y,X) A R5d{Y,X). 

XP13^. R4:a{X,Y) V Rib{X,Y) V R4.b'{X,Y) V i?4c(X,F) V W(X,F) 
^ Rld{Y,X). 

XP14^. RM{X,Y) =» Rlb{Y,X) /\ Rld{Y,X). 

Table 2 consists of three parts, separated by vertical double-lines. The first 
part categorizes the \TZAC{X^Y)\ antichains of Figure 2, based on size which 
ranges from 0 to 7. Each row f, t G [0 ... 7], in the entire table is used to compute 
the orthogonal relations in which antichains rac{X, Y) have size i. Consider any 
row i. For each antichain rac{X, Y) of size i, the number of the corresponding 
legal (as per XP7^-XP14^) antichains rac{Y,X) of size s, s € [0, ...,7], are 
added to column s in the second part of the table. The entry in row i in the last 
part of the table sums up the row entires of columns s = 0 through s = 7 of 
that row, and gives the total number of orthogonal relations in which antichains 
rac(X, Y) have size i. The sum of the last column is 17,185 = 

Note that TZAC needs to consider all the antichains in TZ, not just the 
maximal antichains, because even a subset of a maximal antichain identifies a 
different upward-closed subset of TZ than does the maximal antichain, indicating 
a different set of relations that hold. Also note that for any racl(X, Y), all rela- 
tions in the upward-closed subset of TZ hold and those not in the upward-closed 
subset do not hold. Thus, for any racl{X,Y), there is a bit-vector of size 24 
where each bit corresponds to a relation in TZ, such that there is a “1” for each 
relation in the upward-closed subset of racl{X,Y) and a “0” for each relation 
not in the upward-closed subset of racl{X,Y). Analogously, for any rac2(Y,X) 
that is compatible with racl{X,Y) as per the axioms, there is a bit-vector of 
size 24 where each bit corresponds to a relation in TZ, such that there is a “1” 
for each relation in the upward-closed subset of rad2(Y,X) and a “0” for each 
relation not in the upward-closed subset of rac2{Y, X). Each orthogonal relation 
can thus be represented by a 48-bit vector. 

Example: For the racl{X,Y) antichain R2b{X,Y) /\ R2c{X,Y) /\ R5a{X,Y) 
of size three, the axioms XP7^-XP14^ give R2d(Y,X) f\ R3d(Y,X). The only 
possible antichains rad2(Y, X) can be from the set of relations { R4*{Y,X) } - 
this gives 11 possible antichains rad2{Y, X), counting the antichain of size 0, that 
are compatible with racl(X,Y). Each of these 11 combinations of rad2{Y,X) 
with racl{X,Y) yields a unique 48-bit vector. 



Relations TZO^^ . Observe that the axioms XP7-XP14 given in [12] are ap- 
plicable only to relations in TZ~^ which use Definition 5 of proxies [12], and not 
to relations in 77.^* which use Definition 4 of proxies [12]. If proxies are defined 
by Definition 4 and not Definition 5, then the axioms XP7-XP14 need to be 
replaced by the following axioms XP7^*-XP14^* to obtain all the orthogonal 
relations TZO^d 




734 A. Kshemkalyani and R. Kamath 



XP7^A Rla{X,Y) md{Y,Xy, 

R\h{X,Y) V Rlh'{X,Y) => RYb{Y,X)- 
Rlc\x,Y) V Rld\x,Y) RAc'{Y,X). 

XP8^A Rld{x,Y) RAa{Y,X). 

XP9^A R2a{X,Y) R2d{Y,Xy, 

R2b{X,Y) V R2b'{X,Y) R2b{Y,Xy 
R2c{X,Y) V R2c'{X,Y) R2c'{Y,X). 

XPIO^A R2d{X,Y) R2a{Y,X). 

XPll^A R3a{X,Y) RM{Y,Xy 

R3b{X,Y) V RSb'{X,Y) R3b{Y,xy 
R5c{X,Y) V R5c'{X,Y) => R2,c’{Y,X). 

XP12^A RM{X,Y) md{Y,X). 

XP13^A R4alx,Y) Mrf(y,X); 

R4b{X,Y) V R^b'{X,Y) Rlb{Y,Xy 
R4c{X,Y) V i?4c'(X,r) ^ Rlc'{Y,X). 

XP14^A R4d{X,Y) Rla{Y,X). 



Axioms XP1-XP6 and XP7^*-XP14^‘ are used to derive the orthogonal re- 
lations TZO^y instead of axioms XP1-XP6 and XP7^-XP14^ that were used to 
obtain TZO^ . Results analogous to those in Table 2 for TZO^ are obtained for 
TZO^' and shown in Table 3. The sum of the last column is 123,474 = \TZO^'\. 



Table 3. Number of orthogonal relations in TZO ^' , classified based on size of antichains. 



Size/Number 
of rac{X, Y) 
antichains 


iNumber of antichains rac{Y^X) of size s - 


17- 

o 

II 


ELo 


s = 0 


s = 1 


s = 2 


s = 3 


s = 4 


s = 5 


s = 6 


s = 7 


0/1 


1 


24 


147 


350 


341 


168 


44 


2 


1077 


1/24 


24 


405 


1926 


3695 


3084 


1326 


293 


11 


10764 


2 / 147 


147 


1926 


7097 


11493 


7963 


2768 


527 


18 


31939 


3 / 350 


350 


3695 


11493 


16469 


9406 


2654 


469 


16 


44552 


4 / 341 


341 


3084 


7963 


9406 


4158 


802 


132 


4 


25890 


5 / 168 


168 


1326 


2768 


2654 


802 


18 


0 


0 


7736 


6/44 


44 


293 


527 


469 


132 


0 


0 


0 


1465 


7/2 


2 


11 


18 


16 


4 


0 


0 


0 


51 



4 Conclusions 

Orthogonal relations between events provide an understanding of all possible 
mutually exclusive relations that can hold between the events when complete 
and precise knowledge is available. These form the basis of relation algebras, 
and allow the derivation of relations to represent knowledge when imprecise and 
incomplete information is available. Abstract events, each of which is a partially 
ordered collection of elementary events, are important when reasoning and repre- 
senting actions in complex distributed systems. We derived orthogonal relations 
TZO between abstract events using the space-time model for a distributed system 
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execution. Relations in TZO are analogous to the 13 orthogonal relations between 
linear intervals at a point in space [2] . Relations in TZO are also analogous to the 
following sets of orthogonal relations based on the elementary causality relation: 
(i) the three orthogonal relations between two points in space-time (a A 6, 6 A 
a, a b A b a), (ii) the six orthogonal relations between a linear interval and 
a point in space-time [11], (iii) the 29 orthogonal relations between two linear 
intervals in space-time using the dense model of time [11], and (iv) the 40 or- 
thogonal relations between two linear intervals in space-time using the nondense 
model of time [11]. We expect that as distributed agent-based programs and 
applications become more common, specific uses for these orthogonal relations 
between abstract events will emerge, similar to the uses of the 13 orthogonal 
relations between colocated linear intervals. 
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Abstract. Using mathematical morphology on formulas introduced re- 
cently by Bloch and Lang {Proceedings of IPMU’2000) we define two 
new explanatory relations. Their logical behavior is analyzed. The re- 
sults show that these natural ways for defining preferred explanations 
are robust because these relations satisfy almost all postulates of ex- 
planatory reasoning introduced by Pino-Perez and Uzcategui {Artifi- 
cial Intelligence, 111:131-169, 1999). Actually, the first explanatory re- 
lation is Explanatory-Rational. The second one is not even Explanatory- 
Cumulative but it satisfies new weak postulates. 



1 Introduction 

The process of inferring the best explanation of an observation is usually known 
as abduction. In the logic-based approach to abduction, the background theory 
is given by a consistent set of formulas U. The notion of a possible explanation 
is defined by saying that a formula 7 is an explanation of a if if U { 7 } h a. An 
explanatory relation is a binary relation > where the intended meaning of a [> 7 
is “7 is a preferred explanation of a” . 

In [4], a set of postulates that should be satisfied by preferred explanatory 
relations is proposed and discussed. 

The aim of this work is at least threefold. First, to propose very natural ex- 
planatory relations that in some cases are computationally practicable. Second, 
to examine the adequacy of logical postulates proposed in [4] and third, the 
discovery of new logical properties for the explanatory reasoning. 

In order to accomplish our goals we propose concrete definitions of preferred 
explanations based on mathematical morphology. The starting point is a very 
general setting: a relation between worlds that in most of the cases can be viewed 
as a graph connecting worlds. 

Mathematical morphology operators on logical formulas have been intro- 
duced recently in [1]. These ideas allow us to define the most central part of a 
formula, according to the fundamental principles of this theory (see e.g. [6,7]). 
Using this notion we define two explanatory relations. The first one, , has 
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the following intended meaning: 7 is a preferred explanation of a if 7 is a formula 
entailing the most central part of the conjunction of E with a. For the second 
one, , we define a sequence which approximates the most central part of E\ 
then we say that 7 is a preferred explanation of a if 7 implies the conjunction 
of a with the closest element of the sequence which is consistent with a. 

2 Preliminaries 

Let us recall here the basic principles of morpho-logics. Let PS” be a finite set of 
propositional symbols. The language is generated by PS and the usual connec- 
tives. Well-formed formulas will be denoted by Greek letters 73, ip... Worlds will 
be denoted by ui, u' ... and the set of all worlds by Q. Mod{ip) = {u & Q \ to \= tp} 
is the set of all worlds where p is satisfied. Dilation and erosion (the two funda- 
mental operations of mathematical morphology [6]) of a formula p hy & struc- 
turing element B have been defined in [1] as follows: 

MocI{Db{p)) = {w G 17 I P(w) n Mod{p) yf 0}, (1) 

Mod{EB{p)) = {w G 17 I P(w) \= p}. (2) 

In these equations, the structuring element B represents a relationship be- 
tween worlds, i.e. uj' G B{ijj) iff w' satisfies some relationship with uj. The con- 
dition in Equation 1 expresses that the set of worlds in relation to uj should be 
consistent with p, i.e.: 3uj' G B(iv), uj' ^ p. The condition in Equation 2 is 
stronger and expresses that p should be satisfied in all worlds which stand in 
relation to uj . 

2.1 Properties 

The properties of these basic operations and of other derived operations are de- 
tailed in [1]. The fundamental properties of erosion, that will be used intensively 
in the following, can be summarized as: 

— Independence of the syntax (follows directly from the definition through the 
models) . 

— Monotonicity: erosion is increasing with respect to </?, i.e. 

T’ I- V' Pb(T’) I- -^b(V’), (3) 

for any structuring element B. Erosion is decreasing with respect to the 
structuring element, i.e. 

Vo; gQ,B^cB'^ Eb'(p) F Eb(p). (4) 

— Anti-extensivity^: if B is derived from a reflexive relation, i.e. such that 
Vw G I7,w G B^, the erosion is anti-extensive, i.e. 

As (7:) h p. (5) 

^ In set theoretical mathematical morphology an operation W is said anti-extensive iff 
for any set X, 'P(X) C X. 
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We will only deal with such cases in what follows. We will also consider 
symmetrical relations, i.e. V(w,o;') G G w' G 

— Iteration: Erosion satisfies an iteration property, which is expressed for sym- 
metrical structuring elements as: 

Eb[Eb'{^)] = Eb^(b'){.^)- ( 6 ) 

For instance if i? = B' , and if we denote by E" the erosion by B dilated 
(n — 1) times by itself (this is typically the case for distance based operations 
where the structuring element is a ball of distance, as will be seen in Section 
2.2), we have: 

(7) 

where n, n' denote the size of the erosion (i.e. the “radius” of the structuring 
element). 

— Commutativity with conjunction: 

EB{^Z^Vi) = ^ZiEBWi). ( 8 ) 

~ Erosion of a disjunction: erosion and disjunction do not commute, but we 
have a partial relation: 



Eb{<^) V Eb{-^) b Eb{<^ V V')- 



(9) 



2.2 Illustrative Example 

In all what follows, we will consider as an illustrative example the case where the 
structuring element is defined as a ball of the Hamming distance between worlds 
dn, where d_f/(a;, uj') is the number of propositional symbols that are instantiated 
differently in both worlds. Then dilation and erosion of size n are defined from 
Equations 1 and 2 by using the distance balls of radius n as structuring elements: 

Mod(D"((/?)) = {w G 17 I 3uj' £ n,oj' \= ip and d_f/(w, iv') < n}, (10) 

Mod{E'^{p)) = {w G 17 I Vw' G 12, d^r(w, ui') < n ^ lo' \= p}. (11) 

We make use of a graph representation of worlds, where each node represents 
a world and a link represents an elementary connection between two worlds, i.e. 
being at distance 1 from each other. A ball of radius 1 centered at w is constituted 
by u) and the extremities of the arcs originating in u). This allows for an easy 
visualization of the effects of transformations. 

Let us consider an example with three propositional symbols a, b, c. The 
possible worlds are represented in Figure 1. 

Let us consider p = ->a A b A c. Then we have: 

D^{p) = (-10 A 6 ) V (-10 A c) V {b A c), 

D‘^{p) = by c= -i(a A -■6 A -ic). 
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Fig. 1. Graph representation of possible worlds with 3 symbols and an example of tp 
and two successive dilations. An arc between two nodes means that the corresponding 
nodes are at a distance to each other equal to 1. 



These results are illustrated in Figure 1. Notice that in this kind of figures the 
formula defined by a border is the disjunction of the formulas in the interior of 
the border. 

Erosion can be computed very easily from any conjunctive normal form. 
Indeed, if is a disjunction of literals, i.e., (p = hV I 2 V ...V l„, then we have: 

E\p) = ( 12 ) 

This property, along with the commutativity of erosion with conjunction, allows 
to compute easily the erosion of any formula expressed as a CNF. 

3 Explanatory Relations Based on Erosion 

In this section we define precisely the concept of most central part of a formula 
with the help of the erosion operator. Then, based on this concept, we define 
two explanatory relations. 

3.1 Last Non-empty Erosion 

We denote by Ei{ip) the last erosion of (p, i.e. the erosion of p of the largest 
possible size such that the set of worlds where Ei{p) is satisfied is not empty: 

(13) 

By convention, we set E^{p) = p. Note that last erosion is different from the 
classical notion of ultimate erosion in mathematical morphology^ . We define the 
most central part of a formula as its last erosion. This concept is similar to one 
used in preference modeling in [3] . 

^ The ultimate eorsion is obtained by successive erosions, and is defined as the union 
of the connected components that disappear from one step to the other. 
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Let us consider the illustrative example of Section 2.2. Let us take (see Figure 

2 ): 

(/? = (a V -'6 V -'c) A (a V 6 V c). 

Using Equations 8 and 12, we derive: 

= (aV-'6)A(aV-'c)A(-'6V-'c)A(aV6)A(aVc)A(6Vc) = (oA-'5Ac)V(aA6A-ic). 

Since \~s -L, we have E'^itp) = Ei{ip). 

A preferred explanation of a is then defined from this operator applied on 
E A a, more precisely: 

j^Ee(EAa). (14) 

The idea of taking the last erosion of 27 A a can be interpreted in terms of 
robustness. An erosion of size n of a formula is a formula that can be changed 
while still proving the initial formula. If at most n symbols are changed in A" (tp) 
then (fi is always satisfied. Here, considering Ei{E Aa) means that we are looking 
at the most reduced formula that satisfies 27 A a, i.e. the one that can be changed 
the most while satisfying 27 A a. 

Let us take 27 A a = </? where ip is defined as in the previous example (Figure 
2). For Definition 14, if we denote PE^ene(a) = {j : a q} (the preferred 
explanations of a), we have: 

PEf^ene(a) = {{a A -'b A c), {a A b A -ic), (a A ->6 A c) V (a A 6 A ~'c)}. 

One potential problem with last erosion is that it does not represent all 
“parts” of a formula. Let us take for instance: 27 A a = (a V 6) A (a V c) A (6 V c) 
and 27 A /? = ((a V 6) A (a V c) A (6 V c)) V {-^a A A -ic). Then we have: 
Ei{E A a) = Ei{E A/3) = a A b A c and PEf^ene(a) = Pi7|>cne (/3). The set of 
worlds satisfying 27 A /3 is disconnected, and the connected component containing 
only (-iaA-'bA->c) is not represented in the explanations of /3. If this is considered 
to be a problem, it can be avoided by considering the ultimate erosion instead 
of the last erosion. 
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3.2 Last Consistent Erosion 

Another idea consists in eroding E as much as possible but still under the con- 
straint that it remains consistent with a: 

Eic{E,a) = E’^(E) where n = max{k : E^ (E) A a \/ 1.} . (15) 

From this operator, we define the following explanatory relation: 

W jh Ei^{E,a) Aa, (16) 

This definition has a different interpretation. Here we consider erosion of E 
alone, which means that we are looking at the formulas that satisfy a while 
being the most in the theory, i.e. that can be changed while remaining in the 
theory (but not necessary satisfying a after the changes). 




Fig. 3. An example of last consistent erosion. 



Let us come back to the illustrative example, and take (see Figure 3): A = 
a V 6V c, and a = (a A A c) V (o A 6 A -ic) V (a A A -ic). We have: E^{E) = 
(a V 6) A (a V c) A (6 V c), E‘^{E) = a Ab Ac, and finally E^{E) h _L. Therefore: 

E^{E) Aa = (a A -'6 A c) V (a A 6 A -ic) 

and E‘^{E) A a F T. Therefore the value of n in Definition 16 is equal to 1. For 
Definition 16, 7 can be anything in the set 

PE^ec(a) = {{a A -'b A c),{a A b A -ic), (a A -■6 A c) V {a Ab A “ic)}. 

There is an alternative way of looking at which will be particularly useful 
in the next section. The iteration of the erosion operator provides a method of 
linearly pre-ordering the models of E. Consider the following relation among 
models. 

w < w' yk (w' G E^{E) ^ w G E^{E)). 

It is clear that < is a total pre-order and it is not difficult to verify that the 
following holds: 

a 7 4=^ mod(A U {7}) C min(mod(A U {a}), <). (17) 
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4 Rationality Postulates 



In this section we study the properties of the two proposed explanatory relations 
according to the postulates introduced in [4]. The basic rationality postulates 
for explanatory relations are the following (we use the notation a hu l3 instead 
of H U {a}): 



llEs: 

RLEi;: 



E-CM: 



E-C-Cut: 



RA: 



E-RW: 



LOR: 



E-DR: 



E-R-Cut: 



\~s a ^ a' , a O 7 

a' [> 7 

l' Q;>7 
a > 7 ' 

a [> 7 ; j\~s f3 
(a A /3) O 7 

(aA/ 3 )l >7 , V(5 [a l> (5 5\~s 13] 

a > 7 

a [> 7 ; 7 ' hi; 7 ; 7 ' l/i; _L 
a > 7 ' 

a O 7 ; a l> i5 
a O (7 V 5) 

a l> 7 ; /3 > 7 
(a V /?) [> 7 

a > 7 ; (3\> 6 
(a V /3) [> 7 or (a V /3) l> i5 

(a A /3) > 7 ; 3<5 [a O 5 & 5 hi; j3] 
a O 7 



E-Reflexivity : 

7 > 7 

E-Coni; : \/s ~^a iff there is 7 such that a [> 7 

The intended meaning and motivation for these postulates can be found in 

[4]. 

It is immediate from the definition of and that LLEi;, RLEi;, RA, 

E-RW, and E-Corii; are satisfied. Moreover, from the representation of given 
by equation 17 and some general results of [4] we get the following proposition. 

Proposition 1. is a causal E-rational explanatory relation. In particular, 

it satisfies LLEi;, RLEi;, RA, E-RW, E-Coni;, E-CM and E-R-Cut. 
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From the results in [4] we also know that by being E-rational, also 

satisfies E-C-Cut, E-Reflexivity, E-DR and LOR. However, the situation for 
is quite different since, as we will see below, the basic postulates E-CM and 
E-C-Cut do not hold. 

We will provide now a counter-example of E-CM for . Let us consider 

our illustrative example (see Section 2.2), and take the following formulas (see 
Figure 4): 

X'Aa = -'aV6Vc, 

E/\a/\P = -'[(aAfeAc)V(aA-'6Ac)V(aA-'6A-'c)] = (->aV-'?)V-'c)A(-iaV?)V-ic)A(“iaV6Vc). 





Fig. 4. A counter-example for E-CM. 



Using the computation formulas for erosion of a formula under CNF (Equa- 
tions 8 and 12), we get: 

E^{S A a) = (-10 V 6) A {-^a V c) A (& V c), 

E^{S A a) = -'a A b A c = Ei{S A a). 

A unique world satisfies this formula, and therefore no further erosion can be 
performed (E^{E A a) \~s -L). Similarly, we have: 

E^(S A a A (3) = —'a Ab A —^c = E^{E A a A (3) 

which is the last non-empty erosion. It follows that (-la A b A c) but clearly 

-■a A 6 A c is not a preferred explanation of a A /3. 
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Now we will present a counterexample of E-C-Cut for . Consider 

SAa = a\/b\/c, 



S A P = a\/ ~'b V ~ic. 



We have then: 

E^{S A a) = (a V 6) A (a V c) A (6 V c), 
E^{S A a) = a Ab A c = Ei{S A a), 



E^{E A /?) = (a V —'b) A (a V —'c) A {—'b V “ic), 



E^{E AP) = aA^bA^c= Ei{E A P), 



S A a A P = (aV&Vc)A(aV-'&V -ic), 

E{E A a A P) = {a Ab A -ic) V (a A -■6 A c) = E({S A a A P). 

Let us now put 7 = (a A 6 A -ic) V {a A -•b Ac), then {a A P) 7. Then it is 
clear that a On the other hand, we have that a 5 ifFi 5 = aA&Ac. 

Thus if a b, then 5 \~s P- 



We introduce a weaker form of these postulates: 



E-W-CM: 



a [> 7 ; /? I> 7 
(a A / 3 ) > 7 



E-W-C-Cut: 



{a A P)t> "f , V (5 [a l> i 5 Pt> 5 ] 
a O 7 



These new postulates might look even more natural than the original version 
E-CM and E-C-Cut. However, is the first natural non trivial example 

known in the literature that satisfies E-W-CM and E-W-C-Cut but neither E-CM 
nor E-C-Cut^. There is a natural weakening of E-R-Cut which can be considered 
but we do not have any example for it in which the preferred explanations are 
not unique. 

The next proposition collects all the facts we know about . 

Proposition 2. The explanatory relation satisfies LLEj;, RLE^;, RA, 

E-RW, E-W-CM, E-W-C-Cut, E-Reflexivity and E-Conu. 



Proof: (i) E-W-CM. Let us assume that 7 hu EfiS A a) with EfiE A a) = 
E^{E A a) and 7 hu EfiE A P) with EfiE A P) = E”^{E A P). Let us assume 
that the last non-empty erosion of if A a A /3 is obtained for k. We have, due to 
Equation 8: EfiE A a A P) = E^{E A a A P) = E^{E A a) A E^{E A P). 

3 E-W-CM in fact was already considered by Flach [2] but he did not provide any 
example for it not satisfying already the stronger version E-CM 
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We necessarily have k < n and k < m since otherwise either A a) or 

E^{S A(3) would be inconsistent. This implies, due to the monotonicity property 
of erosion (Equation 4) that: \~s E^{E Aa) -A E^{S Aa) and \~s E'^{E Aj3) -A 
E^{E A 0) from which we derive: 

hu E0E A a) A E(^[E A /?) — >■ E0E A a A 0). 

This interesting general result proves that 7 hi; E0E A a A (i). 

(ii) E-W-C-Cut. Let 7 \~s Ei{^ A a A 0) = E^{E A a A 0). For all S such that 
a O 5, 5 hi; E0S A a) = E™{S A a). Since E A a A 0 \~e E A a we have: 

E^{E AaA0)\/s E^{E A a) \/s E. 



Therefore n < m. 

Let us first assume that n < m. For all <5 such that a l> <5, we have 0 t> 6, 
i.e. S hi; E0E A 0) = E^{E A 0). For the same reason as before, we necessarily 
have n < k. Since the set of preferred explanations of a is included in the one 
of 0, we have: E^{E A a) hi; E^{E A 0). Since m > n, we have: 

E”^{E AaA0)= F™(r A a) A E^{E A 0) hs T. 

Let us now assume n < k. Then similarly, we have: 

AaA0) = E'^iE A a) A E^^{E A 0) hs ±. 

If k > m, we have: E"^{E A 0) -L, and, due to Equation 4: E’^{E A 0) \~s 

E^{EA0). Therefore: E’^(EAa) hi; E0EA0) hi; E"^{EA0), which implies: 
E™-{E Aa A 0)\/s E which leads to a contradiction. 

Similarly, if fc < m, we have: E^{EAa) -L, and E"^{EAa) hi; F^(T'Aa). 
Therefore, since we had E'^{E A a) hi; E^{E A 0), we have: 

E^(E AaA0) = E’^iE A a) A E'^{E A0 )\/sE 

which also leads to a contradiction. From these two contradictions, we can 
conclude that necessarily k = m. Then E'^{E A a) hi; E^{E A 0) becomes 
A a) hi; E™{E A 0) and therefore we have: 

E^{E AaA0) = E^{E Aa)0sE 

which is in contradiction with n < m. Therefore we also have n = m. 

Finally the only possibility is to have fc = n = m. In this case, we have: 

F"(r AaA0)^s E”{E A a) = E^{E A a) hi; E’^{E A 0), 

and therefore: 

7 hi; F"(r A a A /?) ^ 7 hi; F”(r A a), 

i.e. a O 7 . 
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(iii) E-Reflexivity. The definition of is based on the notion of largest possible 
erosion, and therefore no further erosion can be performed. More precisely, let 
a l>^”® 7 and suppose that the last non empty erosion of T" A a is E"‘{S A a). 
Then we have: 

A7) = T'A7 = 7 

and 

E\EA-f) = E^+\EAa) 

which is inconsistent. Therefore 7 7 . □ 

We end this section by considering the postulate LOR. We will give a counter- 
example of it for . Consider 

S A a = (a V 6 V c) A (a V V -ic) 

and 

E A (3 = (-lO V -'6 V c) A (a V V c) A (a V 6 V c). 

We have: 

E^ {S A a) = {a A b A -ic) V (a A -■6 A c) = Eg{E A a), 

E^{E A ( 3 ) = a A-'b Ac = Ei{E A a), 
if A (a V / 3 ) = a V & V c, 

E^{E A (a V P)) = (a V 6) A (a V c) A (6 V c), 

E'^{E a (a V P)) = a A b A c = Ei{E A (a V P)). 

Let 7 = a A -■& A c. Then a 7 and P 7, but {a V P) 

Since E-DR implies LOR [4], then we already know that E-DR fails for . 

Table 1 summarizes the results we obtained so far. 

5 Conclusion 

We have proposed in this paper two definitions of explanatory relations based 
on morphological erosion. Several other definitions could be developed based 
on mathematical morphology. For instance if we replace h by = in Equations 
14 and 16, we come up with definitions that have slightly different properties 
(in particular RA is not satisfied). More importantly, it is natural to use other 
morphological operators instead of erosion, for example the ultimate erosion. 

It is important to observe that erosion provides a geometrical way to totally 
pre-order the models of a formula and this is the underlying idea behind the 
definition of . 

Another interesting feature of this work is that it reveals new properties as 
E-W-C-Cut and new aspects of E-W-CM. These two postulates are very natural; 
they are the weakening of the well known E-CM and E-C-Cut. But until now the 
methods used to define explanatory relations always yield relations satisfying 
the strongest ones. So the method presented here to construct is indeed 

a new way of approaching the problem of selecting preferred explanations of an 
observation. 
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Table 1. Properties of the proposed relations. 



Property 








(Equation 14) 


(Equation 16) 


LLE 


V 


V 


RLE 






E-CM 


X 




E-W-CM 


V 




E-C-Cut 


X 




E-R-Cut 


X 




E-W-C-Cut 


V 




E-Reflexivity 


V 


V 


E-RW 






RA 






LOR 


X 


V 


E-DR 


X 


V 


E-Coni; 


y 


V 
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Abstract. In this paper we define the rather general framework of 
Monotonic Logic Programs, where the main results of (definite) logic 
programming are validly extrapolated. Whenever defining new logic pro- 
gramming extensions, we can thus turn our attention to the stipulation 
and study of its intuitive algebraic properties within the very general 
setting. Then, the existence of a minimum model and of a monotonic 
immediate consequences operator is guaranteed, and they are related as 
in classical logic programming. Afterwards we study the more restricted 
class of residuated logic programs which is able to capture several quite 
distinct logic programming semantics. Namely: Generalized Annotated 
Logic Programs, Fuzzy Logic Programming, Hybrid Probabilistic 
Logic Programs, and Possibilistic Logic Programming. We provide the 
embedding of possibilistic logic programming. 

Keywords. Logic Programming, Possibilistic Logic, Many-valued logics. 



1 Introduction 

The literature ou logic programmiug theory is brimmiug with proposals of lau- 
guages aud seiuautics for exteusious of defiuite logic prograius (e.g. [6,19,4,10]), 
i.e. those without uou-mouotouic or default uegatiou. Usually, the authors char- 
acterize their programs with a model theoretic semautics, where a miuimum 
model is guarauteed to exist, aud a correspoudiug mouotouic fixpoiut operator 
(coutiuuous or uot). lu mauy case the semautics is mauy- valued. 

lu this paper we abstract out all the details aud defiue a rather geueral 
framework of Mouotouic Logic Programs to capture the core “spirit” of logic 
programmiug. For this purpose we follow au algebraic approach to both the 
lauguage aud the semautics of logic programs. We were iuspired by the deep 
theoretical results of mauy- valued logics aud fuzzy logic (see [1,9] for excelleut 
accouuts) aud applied these ideas to logic programmiug. lu fact, a prelimiuary 
work iu this directiou is [19], but the authors restrict themselves to a liuearly 
ordered set of truth- values (the real closed iuterval [0, 1]) aud to a very limited 
syutax: the head of a rule is a literal aud the body is a multiplicatiou (t-uorm) 
of literals. We start by defiuiug the uotiou of au implicatiou symbol, sufficieut 
to guarautee the validity of the staudard logic programmiug results. Later ou 
we resort to residuated lattices (c.f. [1,9]), where a geueralized modus poueus 
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rule is defined. This characterizes the essence of logic programming: from the 
truth- value of bodies of rules for an atom we can determine the truth- value of 
that atom, depending on the degree of confidence in each of the rules. 

Our paper proceeds as follows. In Section 2 we introduce the language of 
Monotonic Logic Programs and associated implication algebras. In the section 
after that we present our main theoretical results. Then we set forth the defini- 
tions of residual lattices and show that the Residuated Logic Programs of [2] are 
a special instance of Monotonic Logic Programs, where one associates with each 
rule a weight or confidence factor. Lastly, we show the embedding of Possibilistic 
Logic Programming into Residuated Logic Programs, and terminate with some 
conclusions and future directions. 

2 Monotonic Logic Programs 

The theoretical foundations of logic programming were clearly established in [11, 
16] for definite logic programs (see also [12]), i.e. programs made up of rules of the 
form Af) d Ax t\. . .f\An{n>Q) where each ^^(0 < t < n) is a propositional sym- 
bol (an atom), C is classical implication, and A the usual Boolean conjunction. 
In this section we generalize the language of definite logic programs in order to 
encompass more complex bodies and heads and, evidently, many- valued logics. 
For simplicity, we consider only the propositional (ground) case. 

When defining a (new) logic it is necessary to address the following two 
distinct but related aspects: the syntax and the corresponding interpretation 
of the logical symbols in the language. In this paper we adopt an algebraic 
characterization of both the language and interpretation of operators. This is a 
very general and powerful framework, allowing for a simple relation between the 
two. For lack of space, we reduce the presentation to the essentials. For more 
details consult for instance [8]. 

The main assumptions of the paper are collected in the next two definitions. 

Definition 1 (Implication Algebra). Let T =< T,di> be a complete upper 
semilattice^ and consider an algebra 21 on the carrier set T. We say that 21 is 
an implication algebra with respect to T iff it has defined an operator ^ on 21 
such that Vai,a 2 G‘i (oi ^ 02 ) = T iff ax'd Q 2 where T is the top element ofT. 

Example 1. The closed real interval [0, 1] with the usual ordering is a complete 
lattice. The algebra © on [0,1] with Godel implication x 4= y = 1 (if a; > y), 
and X 4= y = X otherwise, is an implication algebra. It is obvious that if x < y 
then X 4= y < 1. 

Mark that some many-valued logics have implication connectives which do 
not obey the property of implication algebras. We shall illustrate this in the next 
section. 

^ The original formulation of this definition assumed a complete lattice. As shown 
in [13], we can easily generalize to complete upper semilattices since the meet oper- 
ation over infinite sets is never used. This is also the case for GAPs [10]. 
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Our Monotonic and Residuated Logic Programs will be constructed from the 
abstract syntax induced by an implication algebra and a set of propositional 
symbols. The method of relating syntax and semantics in such an algebraic 
setting is well-known and we defer again to [8] for more details. 

Definition 2 (Monotonic Logic Programs). Let 2t he an implication algebra 
with respect to a complete lattice T. Let LI he a set of propositional symbols and 
FORM<^{II) the corresponding algebra of formulae freely generated from LI and 
the “symbols” of operators in 21. A monotonic logic program is a set of rules of 
the form A ^ 'll such that: 

1. The rule {A is a formula of FORM-^{LI); 

2. The head of the rule A is a propositional symbol of II . 

3. The body formula 'F with propositional symbols Bi,... ,Bn (n > 0) corre- 
sponds to an isotonic function having those symbols as arguments. 

As usual we shall represent binary connectives in infix notation. 

A rule of a monotonic logic program expresses a (monotonic) computation 
rule of the truth-value of the head propositional symbol from the truth-values 
of the symbols in the body. The monotonicity of the rule is guaranteed by the 
isotonicity of the function corresponding to formula if an argument of F is 
monotonically increased then the truth-value of F also monotonically increases. 
The unique homomorphic extension theorem, guarantees that for every interpre- 
tation of propositional symbols there is an unique associated valuation function. 

Example 2. Consider the set of propositional symbols II = {a, b, c} and the 
implication algebra of Example 1 with the additional operator xAy = min{x, y). 
As common, we denote the algebra by 0([0, 1], A). The arity of the operators 

is implicit. Mark that A is the meet in [0, 1]. The formulas b Ac and a <J= 6 A c 
correspond to the functions Xbc. A (b,c) and Xabc. (a,A(5, c)), respectively. 
Variables range over the interval [0, 1]. 

Mark too that we employ the same symbol to represent a connective at 
the syntactic level (formulas) and the corresponding operator in the underlying 
algebra. This simplifies presentation. A simple analysis easily concludes that a 
6 A c is a correct monotonic logic program rule, since Xbc. A (6, c) = Xbc.min{b, c) 
is an isotonic function. 



3 Model and Fixpoint Theory for Monotonic Logic 
Programs 

In this section we define a model and a fixpoint theory for Monotonic Logic 
Programs, and extend to them the classical results of logic programming. The 
important point to realize is that all the fundamental results of logic program- 
ming depend only on the monotonicity of the body of the rule and on the fact 
that it is possible to determine the truth-value of the proposition in the head 
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from the truth-value of the rule body. Also notice that we demand all the rules 
to be satisfied: every implication should evaluate to T. 

Let us start by defining the notion of interpretation. An interpretation is sim- 
ply an assignment of truth-value to each propositional symbol in the language. 
We assume, in the remainder of this section, an implication algebra 21 with re- 
spect to a complete lattice 1 =< T, ^>. The operator and implication symbol 
will be denoted by i— . Consider also that a set II of propositional symbols is 
given, as well as the corresponding algebra FORMf^{II) of formulae over 77. 
Forthwith, the notion of interpretation is straightforward: 

Definition 3 (Interpretation). An interpretation is a mapping I : II ^ 
T. By the unique homomorphic extension theorem, the interpretation extends 
uniquely to a valuation function I : FORM<^{II) — >■ T. The set of all interpre- 
tations with respect to the implication algebra 21 is denoted by 1%. 

The ordering F on the truth- values in T is extended to the set of interpreta- 
tions as follows: 

Definition 4 (Lattice of interpretations). Consider 1% the set of all inter- 
pretations with respect to implication algebra 21, and two arbitrary interpretations 
/i ,/2 G T%. Then, < I 2 I) E> is a complete lattice where I\ □ I 2 iffVp^n h{p) ^ 
l 2 {p)- The least interpretation A maps every propositional symbol to the least el- 
ement of T, and the greatest interpretation V maps every propositional symbol 
to the top element of the complete lattice of truth values T ■ 

A rule of a monotonic logic program is satisfied whenever the truth-value 
of the rule is T. A model is an interpretation which satisfies every rule in the 
program. Formally: 

Definition 5 (Model of a program). Consider an interpretation 7 G I®. 
A monotonic logic program rule A !7 zs satisfied by I iff I {{A ^ F)) = T. 
Interpretation I is a model of a monotonic logic program P iff all rules in P are 
satisfied by I. 

We proceed by showing that every monotonic logic program has a least model 
which is the least fixpoint of a monotonic operator, along with other standard 
logic programming results. One such result is the immediate consequences op- 
erator, extending the results of van Emden and Kowalski [16] to the general 
theoretical setting of implication algebras: 

Definition 6 (Immediate consequences operator). Let P be a monotonic 
logic program. Define the immediate consequences operator Tp : Igi ~^T%, map- 
ping interpretations to interpretations, as: 

Tp{I){A) = lub such that A ^ F G p| 



where A is a propositional symbol. 
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The immediate consequences operator evaluates the body of every rule for a 
propositional symbol A. The truth-value of A is simply the least upper bound 
of the truth- values of all the bodies of the rules for it. 

Theorem 1 (Monotonicity of the immediate consequences operator). 

Let Ii and I 2 be two interpretations in and P a monotonic logic program. 
Operator Tp is monotonic, i.e. if Ii Q I 2 then Tp(/i) C Tp(/ 2 ). 

As usual, the set of models of P is characterized by the post-fixpoints of Tp : 

Theorem 2. An interpretation I ofI% is a model of a monotonic logic program 
P iff (I) 

By the Knaster-Tarski fixpoint theorem, Tp has a least fixpoint. Thus: 

Definition 7 (Semantics of Monotonic Logic Programs). Let P be a 

monotonic logic program and Mp the least fixpoint of Tp. The semantics of 
a monotonic logic program is given by Mp, being its least model. 

Theorem 3 (Fixpoint Semantics). Let P be a monotonic logic program, and 
consider the transfinite sequence of interpretations ofI%: 

T®t° = A 

rp%j^n+i _ Tpir^f'^), ifn+1 is a successor ordinal 
Tpt“ = U/ 3 <a"^pt ^5 if a is a limit ordinal 

Then, there is an ordinal A such that Tp Tp t'^, and the least model of 

P is Mp = T^f^. 

The major difference from standard classical logic programming is that our 
Tp operator might not be continuous, and therefore more than u> iterations may 
be necessary to “reach” the least fixpoint. All the other important results carry 
over to our general framework. This possibility is unavoidable if one wants to re- 
tain generality. For the study of sufficient conditions to guarantee the continuity 
of the T®, see [13]. 

We now illustrate the importance of the provisions of Definition 1 with an 
example: 

Example 3. Reichenbach [15] devised a calculus for addressing the logical prob- 
lems raised by quantum mechanics. He defined three implications, three nega- 
tions, two equivalences, a conjunction and a disjunction. For our example, the 
three implication operators and conjunction will suffice. The set of truth-values 
is {0, 1, 2} with the usual ordering. The truth-tables^ are: 



c 


0 1 2 


<1= 


0 1 2 




0 1 2 


A 


0 1 2 


T 


2 1 0 


IT 


220 


IT 


1 1 0 


IT 


000 


1 


2 2 1 


1 


220 


1 


1 1 1 


1 


0 1 1 


2 


222 


2 


222 


2 


1 1 2 


2 


0 1 2 



^ For the implication connectives, the consequent truth-value appears in rows and the 
antecedent one in columns. 
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Consider the three programs below: 

aCbAc o<i=6Ac a b A c 

bCl b<= 1 6^1 

cCl c^l c<— I 

The least fixpoint of the immediate consequences operator when applied to the 
above programs results in the same interpretation, mapping propositions a, b, 
and c to truth- value 1. The first implication (c) complies with the provisions of 
Definition 1. Therefore the interpretation so obtained is the least model of the 
program on the left, as can be checked easily . 

For the middle program, we know that ti > t2 implies (ti ^2) = 2, for 
ti,t2 £ {0, 1, 2} but the converse does not hold. In general, the least fixpoint is 
a model of the program but might not be minimal, as in this case: the interpre- 
tation mapping every proposition to 0 is a model. 

Finally, for the program on the right, we have that (ti ^2) = 2 entails 
ti > t2 but not conversely. This situation is more problematic, since the least 
fixpoint might not be a model of the program, as per the example. In fact, this 
program has no model because the implications will always be evaluated to a 
truth- value different from 2. 

4 Residuated Logic Programs 

In a non-classical setting the intended truth-value of a given rule might possibly 
not be absolute truth. So, a generalization of Modus Ponens is required to reason 
logically with confidence factors. In many- valued logics this issue is very well 
understood, namely in Fuzzy Propositional Logics [14,1,9] . Since one of our initial 
goals was to capture Fuzzy Logic Programming [5,19], it is natural to adopt as 
the semantical basis the residuated lattices (see for instance [1]). 

Definition 8 (Adjoint pair). Let < P,dip> be a partially ordered set and 
(•^— ,0) a pair of binary operations in P. We say that forms an adjoint 

pair in < P, ^p> iff: 

(01) Operation 0 is isotonic, i.e. if x\,X2,y £ P such that xi X2 then 
{xi ® y) :<p {x2 0 y) and {y 0 a;i) {y 0 X2); 

(02) Operation -p- is isotonic in the first argument (the consequent) and antitonic 
in the second argument (the antecedent), i.e. ifx\,X2,y € P such that x\ <p 
X2 then {x\ £- y) <p {x2 £- y) and (y £- X2) fip (y ^ xi); 

(03) For any x,y,z G P, we have that x <p {y ^ z) holds if and only if 
(x 0 z) <p y holds. 

The intuition for the two above properties is immediate, the third one may be 
more difficult to grasp. In one direction, it is simply asserting that the following 
Fuzzy Modus Ponens rule is valid (cf. [9]): 

If a; is a lower bound oif) :p, and z is a lower bound of :p then a lower 
bound y oi if is x ® z. 
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The other direction ensures that the truth- value of y •<— a: is in fact the maximal 
2 satisfying x 0 z :<p y. Mark that the implication symbol in an adjoint pair 
must obey the provisions of Definition 1 . 

Besides (oi)-(a 3 ), it is necessary to assume the existence of a bottom and a 
top element in the lattice of truth-values, and that T is an unit element of 0. 
It is also usual to assume, additionally, extra conditions on the multiplication 
operation (®), namely associativity, commutativity. 

Definition 9 (Residuated Lattice). Consider the lattice £ =< L,<p>. We 
say that (£,^,0) is a residuated lattice whenever the following three conditions 
are met: 

(^i) Z is a bounded lattice: it has a bottom (_L) and a top element (T); 

(h) (•<— ,<D) is an adjoint pair in £; 

{Iz) (L,(D,T) is a commutative monoidf . 

We say that the residuated lattice is complete whenever < L,<p> is complete. 
In this case condition (h) is immediately satisfied. 

For residuated logic programs we resort to special implication algebra where 
a multiplication operation is defined, and the corresponding residuum operation 
(or implication), plus a constant representing the top element of the lattice of 
truth- values (whose set is the carrier of the algebra). Together they must define 
a complete residuated lattice since we intend to deal with infinite programs (the- 
ories). Obviously, a residuated algebra may have additional operators. Formally: 

Definition 10 (Residuated Algebra). Let (£, O) be a complete residuated 
lattice. The implication algebra 21 defining operators -<—,(8) with respect to Z is a 
residuated algebra. 

From the example at the beginning of this section, it should be clear that in 
order to define the syntax of residuated logic programs it is necessary to know 
beforehand the underlying truth-value residuated algebra, given that each pro- 
gram rule must be associated with a truth- value. Thus, it is natural to generalize 
the language of monotonic logic programs as follows (for a particular instance 
see [19]): 

Definition 11 (Residuated Logic Programs). Let be a residuated alge- 
bra with respect to a residuated lattice (T, ^,(8>). A residual logic program is a 
monotonic logic program over 91 with rules of the form A ^ T where § 
is a truth-value in T, and T an arbitrary isotonic formula. A (weighted) rule 
A D T is represented) by A W. 

® It is shown in [13] that {I 3 ) can be substituted by the weaker condition T (g) 1 ? = 
i?(g)T = i9. 

We are assuming that every truth-value has a corresponding constant mapping in 

91. 
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Consequently, residuated logic programs are a special class of monotonic ones. 
The important result is the following: 

Theorem 4 (Model of a Residuated Logic Program). An interpretation 
I G IfR is a model of a residuated logic program P iff I {A ^ ^ d for every 

weighted rule A i — dr in P. 

Therefore, all the theorems of monotonic logic programs carry over directly 
to residuated ones, under the notion of model portrayed in the above theorem. 
In [13] several implication operators can be put to use. However, it is easily seen 
that the embedding of Definition 11 can be trivially adapted to handle several 
adjoint pairs in the lattice of truth-values, and so the corresponding notion of 
model is the same. Thus Multi-Adjoint Logic Programs are indeed an instance 
of Monotonic ones. 

5 Possibilistic Logic Programming 

As summarized in [7], possibilistic logic is a logic of uncertainty tailored for 
reasoning under incomplete evidence and partially inconsistent knowledge. Each 
formula in the language is assigned a weight in a totally ordered set, correspond- 
ing to a lower bound on the degree of necessity or possibility of the formula. 
The degree of necessity of a formula states to what extent the available evidence 
entails the truth of the formula, while the degree of possibility expresses to what 
extent the truth of a formula is not incompatible with the available evidence. 
The theory is built upon the notion of possibility measure II [20], obeying the 
following axioms: 

iT(T) = 0 iT(T) = 1 n{py q) = max{n{p),n{q)) 

The logic is not truth-functional since in general it is only guaranteed that 
n{p f\q) < mm{n (p) , n (q)) . We base our presentation on [6] and consider as 
well only propositional clauses. For the full theory the reader is referred to the 
excellent overview of [7]. 

A propositional possibilistic clause is either a pair (c (Na)) or a pair (c {II (i)), 
where c is a propositional clause, a G ]0, 1] and f3 G [0, 1]. In general terms, the 
semantics is engendered from the possibility measures which satisfy the set of 
possibilistic clauses: 

— The formula (c (Na)) states that c is certain at least to degree a, i.e. N{c) > 
a which is equivalent to II {->c) < 1 — a. 

— The expression (c (77/3)) states that c is possible in some world at least to 
degree /3, i.e. 71(c) > /3. 

Notice that 77 and TV are dual measures of necessity and possibility governed 
by the equation 7V(c) = 1 — 77(-ic). The truth-values (770), . . . , (77/3), . . . , (771), 
. . . , (Na ), . . . , (TVl) are totally ordered. We denote the complete lattice formed 
this way by V. The semantics of a possibilistic set of clauses T is given by a 
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consequence relation. We say that (c w) is a logical consequence of (denoted 
by .7^ ^ (c w)) if every possibility measure that satisfies T also satisfies (c w). 

There is a formal system [7] which is sound and complete with respect to the 
above inconsistency-tolerant semantics of possibilistic logic. The inference rules 
for the propositional case are: 

(GMP) {ip wi), {ip ^ tp W 2 ) \- {ip Wi * W 2 ) 

{S) {ip Wi) h {ip W2),yW2 < Wi 



where operation * is defined by 



{Na) * {NP) 
{Na) * {np) 
{na) * {np) 



{N min(o;,/3)) 
f (77/3) ii a + P > 1 
\ (770) iia + P<l 
(770) 



The important point for our discussion is that * is a multiplication operation 
and therefore we can define an appropriate residuum operator («-) such that 
jointly they form an adjoint pair. In our setting, and given an interpretation 7 
such that I {ip) = w\ we know that 

I{lp p) >W2 iff I{lp) > I {ip) * W2 iff I{'ip) > Wi * W2 

similarly to the above inference rule {GMP). Notice that our interpretations 
might not be models of the possibilistic theory. However, we are able to extract 
some information regarding the possibility or necessity degree of propositional 
symbols. This is a technique quite similar to the one employed by Kifer and Sub- 
rahmanian [10] for embedding van Emden’s quantitative rules into GAPs [17]. 

To simplify the presentation we map the truth values in V to the real interval 
[0, 1], as follows: truth- value (77 /3) is mapped to ^ and truth- value (TV a) corre- 
sponds to This is a bijection. The multiplication operator * is isomorphic 
to operator x in the definition next: 

Definition 12 (Possibilistic Residuated Algebra). LeT fp he the residuated 
algebra on [0, 1] with multiplication x and implication operators defined by: 

_ J 0 ifpi+P2<l 

\mm{pi,p2) ifpi+p2>l 

( 1 if Pi >P2 

Pi ^ P 2 = I Pi if Pi < P 2 and pi + P 2 > ^ 
[^-P 2 ifpi<P 2 and pi + P 2 < 1 

A Possibilistic Logic Program [6] is a finite set of (first-order) possibilistic 
Horn Clauses annotated only with necessity degrees. We consider just the case 
of propositional Possibilistic Logic Programs. We have the following expected 
result: 
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Theorem 5. Let T he a propositional possibilistic logic program, a set of possi- 
bilistic Horn clauses (i?iA. . .Ai3„ Aw) where A, Bi, . . . B„ are propositional 
symbols and w is a weight of the form (Na) . We construct monotonic logic pro- 
gram P over ^ by translating each possibilistic Horn clause to the attendant 
monotonic logic program rule A p x Bi x . . . x _B„ where p = is the real 
number in [0, 1] corresponding to weight w. Then T \= {Aw) iffw < (Ifp Tp){A). 

For the case of possibilistic Horn clauses annotated with possibility degrees 
which is not addressed in [6], our own translation is sound but not complete, as 
discussed in the next example: 

Example 4- Consider the possibilistic theory: 

(oA&— q{Nl)) {p^a{Nl)) (p— >-6(iVl)) (p (7T1)) 

Translating the above to a residual logic program we surmise in the least fixpoint 
of the immediate consequences operator that q is provable with degree {HO). 
However, in possibilistic logic we conclude {q (HI). The problem is that our Tp 
operator is not aware that the proofs for (a (HI) and (b HI) depend on the 
same proposition p. 

We conjecture, however, that in situations where the proof does not depend 
on multiple uses of the same proposition, we extract the correct conclusions: 
Example 5 (adapted from [6]). Consider the knowledge base: 

works(john,paris) — >■ lives(john,paris) (N 0.7) 
works{mary, paris) — >■ lives{mary,paris) (N 0.7) 
works{henry,paris) — >■ lives(henry, paris) (N 0.7) 
lives{mary, paris) — >■ lives(john, paris) (N 0.6) 
works{henry, paris) — >■ works{john, paris) (H 0.8) 
works{mary, paris) (Nl) 
works{henry, paris) (Nl) 

It translates to the monotonic logic program: 

lives(john, paris) «- 0.85 x works(john, paris) 
lives{mary, paris) 0.85 x works{mary, paris) 
lives{henry, paris) ^ 0.85 x works{henry, paris) 
lives{john, paris) 0.8 x lives{mary, paris) 

works{john, paris) ^ 0.4 x works (henry, paris) 
works(mary, paris) «- 1 
works(henry, paris) «- 1 

We gather from the least model of the above monotonic logic program that 
works(mary, paris) and works(henry, paris) have confidence 1.0, i.e. neces- 
sity degree (N 1). Moreover, both Henry and Mary live in Paris with neces- 
sity degree (N 0.7). Regarding John, the proposition works(john, paris) is 
ascribed the truth-value 0.4 corresponding to possibility degree (H 0.8), and 
lives(john, paris) receives value 0.8, meaning that it is necessary that John lives 
in Paris with at least degree 0.6. Such results are in accordance with Possibilistic 
Logic. 
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6 Conclusions and Further Work 

The strong point of this paper is the generality of our setting, both at the lan- 
guage and at the semantical level. We have presented an algebraic characteriza- 
tion of Monotonic Logic Programs. Program rules are arbitrary monotonic body 
functions where the heads are propositional symbols. Our semantic structures 
are implication algebras where an appropriate implication operator is imposed. 
We then obtain a logic programming semantics with corresponding model and 
fixpoint theory. The major construction is a generalized immediate consequences 
operator, in the spirit of van Emden and Kowalski’s Tp operator. The operator is 
monotonic and the models of a Monotonic Logic Program are its post-fixpoints. 
Therefore a minimum model is guaranteed to exist, it being the least fixpoint 
of the immediate consequences operator. Thus, when defining a new logic pro- 
gramming semantics we can shift attention to the stipulation and study of its 
intuitive semantical algebraic properties because the main results of definite logic 
programming carry over for free to this very general setting. 

We have studied residual lattices and algebras, where a generalized form 
of Modus Ponens Rule is valid. Having defined an implication (or residuum 
operator), and the associated multiplication, we can assign to program rules a 
confidence factor, thereby defining residuated logic programs. We show that the 
results of Monotonic Logic Programs carry over immediately to Residuated ones. 

We provide a simple translation of Possibilistic Logic Programming into 
Residuated Logic Programs, and discuss the issues arising from the introduc- 
tion of possibility degrees. In [2] we have previously shown that our framework 
can capture Hybrid Probabilistic Logic Programs [4]. Both ground Generalized 
Annotated Logic Programs [10] and Fuzzy Logic Programming [19] are also spe- 
cial cases of our setting. 

The embedding of c-annotated ground GAP rules under the restricted se- 
mantics is direct, and uses a technique similar to the one presented in [2]. More 
interestingly, we devised a way for translating ground GAPs rules with arbi- 
trary annotations (constant, variable, or term annotated) into a single Mono- 
tonic Logic Programming rule without occurrences of variables. This translation 
only assumes that (finite) meets are defined in the truth-value complete upper 
semi-lattice. The converse is also possible, using an extension of the embedding 
of Fuzzy Logic Programs into Annotated Logic Programming [18]. It shall be 
the subject of forthcoming work. 

Our incursion paves the way to combine and integrate several forms of rea- 
soning into a uniform framework, namely fuzzy, probabilistic, uncertain, and 
paraconsistent reasoning. We have defined too [3] another class of logic programs, 
extending the Monotonic one, where rule bodies may be anti-monotonic func- 
tions, with well-founded and Stable Model like semantics. This brings together 
non-monotonic and incomplete forms of reasoning with those listed before. 
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Abstract. In a recent work we defined a possibilistic logic program- 
ming language, called PGL^, dealing with fuzzy propositions and with 
a fuzzy unification mechanism. The proof system, modus ponens-style, 
was shown to be complete when restricted to a class of Horn clauses sat- 
isfying two types of constraints. In this paper we complete the definition 
of the logic programming system. In particular, we first formalize a no- 
tion of PGL"'" program and discuss the two types of constraints (called 
modularity and context constraints) we argue they must satisfy; second, 
we extend the PCL"*" calculus with a chaining and fusion mechanism 
whose application ensures the fulfillment of the modularity constraint; 
and finally, we define an efficient (as much as possible) proof procedure 
oriented to goals which is complete for PGL^ programs satisfying the 
context constraint. 



1 Introduction 

In the recent past, fuzzy logic programming has received increasing interest in 
areas such as Artificial Intelligence or deductive databases. This interest is due 
to the fact that both subfields of computer science need to produce systems 
exhibiting knowledge representation and reasoning models, more flexible than 
purely symbolic deduction. 

In the literature, most proposals for logic programming in logics of uncer- 
tainty and fuzziness are mostly reduced to the question of how some general- 
ization of the resolution principle can be formulated and how the automated 
deduction can be realized. The fuzzy resolution principle roots to 1972 when the 
paper of Lee [16] was published. From then, a number of systems have been pro- 
posed based on a variety of non-classical logics such as multiple- valued logics [15, 
19], possibilistic logic [8], probabilistic logic [17], evidential support logic [6], or 
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TIC96-1038-C04-01/03. 
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fuzzy operator logic [20]. Depending on the underlying logic, some systems are 
more suitable for dealing with vague knowledge, while others are more appropri- 
ate for reasoning under uncertainty. And, although there exist some attempts to 
handle fuzzy unification [5,12,18] in the framework of fuzzy logic programming, 
only the inference systems defined by Baldwin et al. [7] for evidential reasoning, 
and Dubois et al. [11] and Godo and Vila [13] for possibilistic logic, provide a 
unified framework for the treatment of vagueness and uncertainty. 

The necessity- valued fragment of Possibilistic logic [9] is a logic of uncertainty 
to reason with classical (propositional or first-order) formulas under incomplete 
information. To enhance the knowledge representation power, Dubois et al. [11] 
defined a syntactic extension of first-order possibilistic logic (called PLFC) to 
deal with fuzzy constants and fuzzily restricted quantifiers inside the language, 
for which Alsinet et al. [4] defined a formal semantics and a sound proof method 
for general clauses. PLFC provides a powerful framework representing disjunctive 
and conjunctive fuzzy information, but has some computational limitations [4]. 
On the one hand, the current proof method for PLFC (refutation through a 
generalized resolution rule, a fusion rule and a merging rule) is not complete. 
On the other hand, during the proof process, the merging rule must be applied 
after every resolution step, and thus the search space consists of all possible 
orderings of the literals in the knowledge base. 

Due to these limitations, we have turned our attention to a possibilistic 
logic programming language. Horn-rule style, allowing only disjunctive fuzzy 
constants. Within this restricted framework our final aim is to fully define a log- 
ical system for reasoning under possibilistic uncertainty and disjunctive vague 
knowledge with an efficient and complete proof procedure oriented to goals. To 
achieve this objective, we first defined in [1] a possibilistic logic programming 
language of Horn rules with fuzzy propositional variables, called PGL, and we 
provided it with a complete modus ponens-style calculus. After, in [2], we ex- 
tended this language with disjunctive fuzzy constants (called PGL+) and the 
proof method with a semantical unification mechanism of disjunctive fuzzy con- 
stants and three other inference patterns. We proved that this extension was still 
complete for atomic deduction when clauses fulfill two kinds of constraints. 

In this paper we complete the definition of the possibilistic logic program- 
ming language PGL’’' by (i) formalizing a notion of suitable PGL"'' program and 
discussing the need of the modularity and context constraints; (ii) extending 
the proof method with a chaining and fusion mechanism whose application in 
a pre-processing step ensures the satisfaction of the modularity constraint of 
PGL''' programs; and (iii) defining an efficient as possible proof procedure for 
determining the maximum necessity degree of a goal from a PGL''' program sat- 
isfying the context constraint. The paper is organized as follows. In Section 2 
we present the syntax, the semantics and the inference machinery of PGL+. In 
Section 3 we formalize a suitable notion of PGL''' program, while in Section 4 we 
discuss the modularity and context constraints. Finally, in Section 5 we provide 
the basis for automating the deductive proof method. Proofs and algorithms of 
the results presented in this paper can be found in [3] . 
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2 The Language PGL+ 

We start by briefly recalling from [2] which is the syntax, the many-valued and 
the possibilistic semantics of PGL+ . Then, we present the logical inference, which 
basically consists of a resolution (chaining), a fusion and a fuzzy unification 
mechanisms. 

The basic components of PGL’’' formulas are: a set of primitive propositions 
Tor; sorts of constants; a set C of object constants (crisp and fuzzy constants), 
each having its sort; a set Pred of unary^ regular predicates, each one having a 
type (a type is a tuple of sorts); and connectives A, — >■. 

An atomic formula is either a primitive proposition from Var or of the form 
p{A), where p is a predicate symbol from Pred, A is an object constant from C 
and the sort of A corresponds to the type of p. 

Formulas are Horn-rules of the form pi A ■■■ A pk -A q with k > 0, where 
Pi, . . . ,Pk, q are atomic formulas. A (weighted) clause is a pair of the form (p, a), 
where p is a Horn-rule and a G [Oj !]• 

Our intended interpretation is that a fuzzy constant represents a vague, ill- 
defined property, and weights denote lower bounds on the belief degree, in terms 
of necessity measures, with which events can be attached with. For instance, 
the statement “it is almost sure that he has a low salary” can be represented 
in this framework by the clause (salary{low), 0.9), where salary(-) is a classi- 
cal predicate of type (euros); low is a fuzzy constant of sort euros; and the 
degree 0.9 express how much is believed the formula salary(low) in terms of 
a necessity measure. In case low denotes a crisp interval of salaries, the clause 
(salary{low), 0.9) is interpreted as the sentence “3x G low such that salary{x)” 
being certain with a necessity of at least 0.9. In the case low denotes a fuzzy 
interval with a membership function p.iow, the above clause is interpreted in pos- 
sibilistic terms as if, for each a G [0, 1], the sentence “3x G [fJ.iow]a such that 
salary{x)” is certain with a necessity of at least min(0.9, 1 — a), where [fJ,iow]a 
denotes the a-cut of fiiow So, fuzzy constants can be seen as (flexible) restric- 
tions on an existential quantifier. Moreover, it is natural that the truth-value of 
salary{low), under a given interpretation in which the salary is xq euros, be the 
degree in which the salary xq is considered to be low, i.e. p,iow{xo). Therefore, 
instead of Boolean (two- valued), PGL“'" formulas are many- valued in nature, and 
we shall take the whole unit interval [0, 1] as set of truth- values. 

A many-valued interpretation for the language is a structure / = {U,i,m) 
which maps each basic sort a into a non-empty domain Ua', a primitive proposi- 
tion q into a value i{q) G [0, 1]; a predicate p of type (a) into a value i(p) G Ua-; 
and an object constant A (crisp or fuzzy constant) of sort cr into a normalized 
fuzzy set m(A) with membership function [0, 1]. Remark that in- 

terpretations are disjunctive in the sense that, for each predicate symbol p, i{p) 

^ We restrict ourselves to unary predicates for the sake of simplicity. However, since 
variables and function symbols are not allowed, the language still remains proposi- 
tional. 
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is a unique value of the domain. Indeed, in contrast to PLFC, fuzzy constants 
in PGL”'' always express disjunctive fuzzy knowledge. 

The truth value of an atomic formula tp under an interpretation I = ([/, i, m), 
denoted by is just i{q) if (/? is a primitive proposition q, and it is computed 
as P'm{A){i{p)) if V is of the form p{A). This truth value extends to rules by 
means of the min-conjunction and Godel’s many-valued implication: 



I{pi A-- - Apk^ q) 



1, if min(/(pi),...,/(pfe)) < /(g) 
I{q), otherwise 



As it is usual in uncertainty logics, the belief on propositions is measured 
by means of an uncertainty measure on the set of interpretations. In a possi- 
bilistic logic, the measure is obviously related to Possibility Theory. In classical 
possibilistic logic, a necessity measure is used on top of the classical (Boolean) 
interpretations of the language. Here we have many-valued interpretations, and 
this means that each formula does not induce a crisp but a fuzzy set of inter- 
pretations, hence the uncertainty measure has to be defined for these fuzzy sets. 
Moreover we may choose multiple domains where to interpret fuzzy constants. 
Thus, in order to define a possibilistic semantics for PGL+, we need to fix a 
meaning for the fuzzy constants and to consider some extension of the standard 
notion of necessity measure for fuzzy events (cf. [1]). The first is achieved by 
fixing what we call a context. Basically, a context is the set of interpretations 
sharing a common domain and a common interpretation of object constants. 
We denote by Ifj.m the context of interpretations sharing a domain U and an 
interpretation of constants m. Notice that, in a given context Ijj.m we can define 
which is the fuzzy set [p] of models for a formula p, just by taking /i[,^](/) = I{p), 
for all / G 

Now, in a fixed context Ijj.m, a belief state (or possibilistic model) is deter- 
mined by a normalized possibility distribution on Iu,m, tt : [0, 1]. Then, 

we say that tt satisfies a clause {p,a), written tt ^ (/’,«), iff the necessity mea- 
sure of the fuzzy set of models of p with respect to tt, denoted N{[p] \ tt), is 
indeed at least a. Here we take 



N{[p] I tt) = inf tt{I) P[^]{I) 

i ^-^U,Tn 

where is the reciprocal of Godel’s many-valued implication, defined as a; 
y = 1 A X < y and x ^ y = 1 — x, otherwise. This necessity measure for fuzzy 
sets was proposed and discussed by Dubois and Prade in [10]. As usual, a set 
of clauses P is said to entail another clause {p, a), written P ^ {p, a), iff every 
possibilistic model tt satisfying all the clauses in P also satisfies {p,a). Finally, 
still in a context Ijj.m, the degree of possibilistic entailment of an atomic formula 
(or goal) phy & set of clauses P, denoted by ||(/?||p, is the greatest a G [0, 1] such 
that P \= (p,a). In [2], we proved that ||:/j||p = inf{iV([i^] | tt) | tt |= P}. 

Notation convention: Since we need to fix a context Iu,m in order to perform de- 
duction, we can identify a fuzzy constant A with its interpreted fuzzy set m{A) and 
also with its membership function y,rn(A)- Hence, for the sake of a simpler notation 
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we shall consider fuzzy constants simply as fuzzy sets. Further, if A and B are fnzzy 
constants, AnB and AuB will refer to their fnzzy set min-intersection and max-nnion, 
respectively. 

The calculus for PGL+ in a given context is defined by the following set of 
inference rules: 



Generalized resolution: 



(p A s —>■ g(A), a) 
(q(B) At^r,P) 

(p A s A t — >■ r, min(a, /?)) 



[GR], if A<B 



Fusion: 

(p{A) As^ q{D),a) 

{p{B)At^q{E),P) 

{p{A U B) A s At ^ q{D U if), min(o;, /?)) 

Intersection: 

{p{A),a) {p{B),(3) 

{p{A n i?),min(a, (3)) 



Resolving uncertainty: 



ip{A')A) 



[UN], for A' 



max(l — a. A) 



Semantical unification: 



{p{A),(x) 

{p{B),min{a,N{B \ A))) 



[SU], where N{B \ A) 



inf A{u) B{u) 

U^Uoj 



For each context Iu,m, the above GR, FU, SU, IN and UN inference rules can 
be proved to be sound with respect to the possibilistic entailment of clauses. 
Moreover we shall also refer to the following weighted modus ponens rule, 
which can be seen as particular case of the GR rule 

(pi A ... Apn^ q,a) 

(g,min(a,/ii,...,/3„)) 

Finally, the notion of proof in PGL+, denoted by F, is as in classical logic 
programming languages, i.e. deduction by means of the triviality axiom and the 
PGL’*' inference rules. Then, given a context Iu,m, the degree of deduction of a 
goal <p from a set of clauses P, denoted \<p\p, is the greatest a G [0, 1] for which 
P h (p,a). 
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3 PGL+ Programs 

A program is a triple V = {P, U,m), where P is a finite set of PGL+ clauses; 
U is & collection of non-empty domains; and m is an interpretation of object 
constants over U such that for each object constant B appearing in P there 
exist M, u G Per, <J being the sort of B, such that B{u) =0 and B{v) = 1. Notice 
that a program V = (P, U, m) determines a particular context Iu,m in the sense 
of the previous section. Further, we say that V = (P, U, m) is a non-recursive 
program if P does not contain recursive formulas^ and is satisfiable if there 
exists a normalized possibility distribution tt : Ii/^m [0, 1] that satisfies all the 
clauses in P. The idea is that we shall restrict ourselves to non-recursive and 
satisfiable programs. Next we justify this choice. 

Given a context 21(7, m, it is easy to check that in PGL+ the sets of clauses 
P = {{p{A) , a) , {p{A) — >• q{B),/3)} and P' = {((/(P), min(/3, a))} are equiva- 
lent as far as we are interested in the entailment degree of a goal q{C), i.e. 
lk(C)||p = lk(C*)||p'- However, this intuitive behavior may be lost when we 
consider programs with recursive formulas. For instance, consider the set of 
clauses 

Q = {{age{young) A study juniver sit y — >■ age{btwA9J26), 0.6), 

{study -University, 1)}, 

in a particular context. The Horn-rule in Q is intended to express that “some- 
body can be assumed to be between 19 and 26 years old (with a necessity > 0.6) 
whenever we know both he/she is young (in a broad sense) and he/she is study- 
ing at the university” . Hence, since Q has no clause about whether the student 
is young, one would expect from Q to infer nothing about his age from the only 
fact that he/she is a university student. But it turns out that from Q one can 
logically derive the clause {age{not-young-or-btwA9-26),0.6), where 

notjyoung-or-btwA9-2(S{u) = young{u) btwA9d26{u) 

with being Godel’s implication. The reason is that a recursive Horn-rule 
like p{A) A g — >■ p{B) is logically equivalent, in Godel’s logic, to the formula 
q — >■ {p{A) — >■ p{B)), and in our framework this is equivalent to g — >■ p{C), 
where the fuzzy constant C is point- wisely defined as C{u) = A{u) B{u). 
Therefore, even though it would be possible to define an inference pattern for 
transforming recursive formulas into non-recursive ones, the system user could 
be negatively surprised by some of the computations done by the system, as we 
have seen above. 

On the other hand, in contrast to the classical case, PGL+ programs are not 
always satisfiable, and moreover, the satisfiability of a set of clauses depends on 
the interpretation of object constants. Therefore, in order to define a sound and 
coherent logic programming system, we shall restrict ourselves to non-recursive 
and satisfiable programs, simply referred in the rest of the paper as PGL'’' pro- 
grams. 

^ A recursive formula is of the form p A q{B) — >■ q{C) or is the result of combining two 
or more formulas of the form s A p{A) — >■ q{B) and r A q{C) — >■ p{D). 
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4 The Modularity and Context Constraints 

In this section we describe and discuss two kinds of constraints we argue our 
PGL’’' programs must satisfy in order the proof method to be complete. First 
we focus on what we call modularity constraint, which, in contrast to PLFC, 
we show it can be fulfilled by a pre-processing step of the program by means of 
the GR and FU rules. Then, we establish the bases for defining an efficient and 
complete proof procedure for PGL+ programs satisfying what we call context 
constraint. 

4.1 Modularity Constraint 

The satisfaction of the modularity constraint by a PGL’’' program ensures that 
all (explicit and hidden) clauses of programs are considered. Indeed, since fuzzy 
constants are interpreted as (flexible) restrictions on an existential quantifier, 
atomic formulas clearly express disjunctive information. For instance, when A = 
{tti, . . . , a„}, p{A) is equivalent to the disjunction p{ai) V . . . Vp(a„). Therefore, 
when parts of this (hidden) disjunctive information occur in the body of several 
formulas of a program, and we don’t want to loose them, we are led to perform 
a completion process of the program, as a pre-processing step, based on the GR 
and FU rules. Let us briefly discuss this requirement by means of one example. 

Example 1 . Let V = (P, U, m) be a PGL program with 

P = {{p{A) ^ <7, 1), (p(P) ^ r(C), 1), (r(C') ^ g, 1), (p(P), 1)}, 

and with m such that A = {ai, 02}, B = {61, &2}, C = {ci, C2}, C = {ci, C2, C3} 
and D = {ai,6i}. We can easily check that, using only MP and SU rules, the 
goal q cannot be deduced from P with a strictly positive degree. However, it 
is not difficult to also check that {p{A U P) — >■ g, 1) is indeed a clause which 
logically follows from P and which can be proved from P if the GR and FU 
rules are used. Therefore, if we first complete P with {p{A UP) — >■ 9, 1), then 
(g, 1) will be also provable by using the MP and SU rules. □ 

At this point we are ready to formalize the modularity constraint of PGL+ 
program. Let P = (P, U, m) be a PGL+ program. We recursively define the set 
of valid clauses of P, denoted by P+, in the following way: 

1. P C P+. 

2. If {ip — >• q{A),a) and /\q{B) — >• r{C),P) G P+ are such that A< B, then 
(79 A f/' — >■ r(C'),min(a, /?)) G P+ as well. 

3. If {p{A)A(p -A q{D),a) and {p{B)A'tp -A q{E),f}) G P+ are such that A^ B 
and B ^ A, then {p{A U P) A 79 A f/' — >■ q{D U P),min(a, j 3 )) G P+ as well. 

4. Only the clauses obtained by 1, 2 or 3 belong to P+ . 

Then, we say that P satisfies the modularity constraint if P = P+. Moreover, we 
say that a clause {p,a) G P is a basic clause if (P+\{(t 9, a)})+ = P+\{(t 9, a)}. 

The following facts shed light on the relationship between a PGL+ program 
P and its set of valid clauses P+: (i) if {p, a) G P+, then P h (179, a); (ii) for any 
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goal q{C), ||( 7 (C')||p = ||g(C')||p+; and (iii) for any predicate q appearing in P in 
the head of a formula, there exists a clause {ip,a) G P such that the head of (p 
is q and ((/?, a) is a basic clause of P. 

4.2 Context Constraint 

Now let us consider another kind of constraint we want our programs satisfy. 
The idea is that in a PGL+ program satisfying the so-called context constraint 
the use of the SU and MP inference is enough to attain a degree of deduction 
equal to the degree of possibilistic entailment. And for this we need that 

— the SU and MP rules work in a, say, locally complete way, and that 

— the possibilistic entailment degree of a goal be univocally determined by 
those clauses in the program having the goal in their head or leading to one 
of these clauses by resolving them with other clauses. 

Next we argue the need for these requirements. 

As for the first one, it is indeed needed to avoid problems of weakening of the 
deduction power in simple modus ponens inference steps involving a unification 
process. For instance, given a context Iu,m, we can prove ||p(^)||{(p(p),q)} = 
min(a, (5), where S = N{A \ E), and ||g(C)||{(p(A).min(a.i)),(p(A)^<;(B),/3)} = 
lk(C')||{(q(B),min(a,< 5 ,/ 3 ))}- However, due to the necessity measure used for com- 
puting the partial matching between fuzzy constants, the expected equality 
h{C)\\{(p(E),a),{p{A)^q{B),0)} = lk(C')ll{(g(B).min(a.5,/3))} may not hold. Actually, 
it strongly depends on how m interprets each object constant. Indeed, it can 
be proved that the equality holds iff either min(a,/3) < 6 or, for some v G Uu, 
A{v) = 0 and E{v) = 1 — S, a being the type of p. 

Example 2. Let V — {P, U, m) be a PGL+ program with 

P = {{age{btwA9J21) -A weight{aboutJ>0),0A), (age{about_20),0.9)}, 

and with m attaching to constants the following trapezoidal^ fuzzy sets: 
btwA9J21 = [18; 19; 21; 22] (years), about_20 = [15; 20; 20; 25] (years) , and 
about J50 = [45; 50; 50; 55] (kilograms) . Then, N(btwA9221 \ about220) = 
0.25 < min(0.9,0.4) and, for all u G years such that btwA9221{u) = 0 it 
is about-20{u) < 0.75. And, one can check that, for any fuzzy constant C, 
\\weight{C)\\p = \\weight{C)\\{(^^ezghtiD)p)}, where 

, , _ f about J)0{u) , if u G [48,52] 

^ \ 0.6, otherwise 

Therefore, |[wez(/ft.t(a6oMt_50)|]{(u,eig/it(n),i)} = 0-4, and hence, although 

||a(/e(&tw_19_21)||p < 0.4, we get that \\weight{aboutA0)\\p = 0.4. However, our 
indeed intention was to express that “people can be assumed to weight about 
50 kilos (with a necessity > 0.4) whenever we know they are between 19 and 21 
years old ”. □ 

^ We represent a trapezoidal fuzzy set as [ti; t2; ts; tr], where the interval [ti, t4] is the 
support and the interval [12,^3] is the core. 
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Again, it would be possible to define an inference pattern for transforming clauses 
as we have done in Example 2. However, the logic programming system would 
have some important computational limitations, besides of probably surprising 
an unaware user with some results computed by the system. In fact, after each 
resolution step, the membership function of the fuzzy constant in the resolvent 
clause should be recomputed from the interpretation of all fuzzy constants in 
the body of the resolved clause. Moreover, at this point, some new valid clauses 
should be considered since they could not be computed in a pre-processing step 
(see Section 5). Therefore, in order to define a sound and efficient proof pro- 
cedure, we have to consider PGL+ programs with well-behaved (in the above 
sense) interpretations of fuzzy constants. 

As for the second requirement, the objective of the context constraint is 
to ensure that, for any goal q{C), || 9 (C')||p can be determined only from the 
subset Pq of clauses (tp, a) in P for which either q is in the head of or g 
depends^ on the head of ip. Roughly speaking, with this requirement one wants 
to avoid having formulas of the form {q{A) — >• t{B),a) and (t{B),P) together 
in a program since, due to the disjunctive interpretation of fuzzy constants, we 
would have that a formula like (g(A),(5) should be derivable, where A and B 
denote the complement of A and B, respectively, and thus we should enable a 
kind of modus tollens inference mechanism. 

Example 3. Consider the set of clauses 

P = {{age{btwAlJ2A), 1), (weightiAQ) , 1), 

{age{htw ASdlO) — >■ weight{btwJ50J55), 1 )}, 

the goal age{17), and a particular context in which the object constants 
btwA7320, btwA8720 and btwA0A5 are interpreted as the crisp intervals 
[17, 20] (years), [18, 20] (years) and [50, 55] (kilograms) , respectively. One can 
check that ||age(17)||p = ||a3e(17)|[{(oge(i7)^i)} = 1. However, in P there is 
no explicit information expressing that “he or she is 17 years old”, and thus, 
\\age{17)\\p should be determined just from {age{btwA7 .20), 1). □ 

In order to formalize the context constraint we need the following result, 
already stated in [2]. Let P — {P, U, m) be a PGL+ program, let g be a predicate 
symbol of type a appearing in P, and let us denote by Dq the object constant 
(of sort cr) such that, for all u £ Ua, 

Dq{u) = inf{D('u) I P h {q{D), 1)} . 

Then, it holds that P h (g(0,),l) and ||g(C')||p = ||g(C')|]{(g(p,^),i)}. 

At this point we are ready to formalize the context constraint. Let V = 
(P, U, m) be a PGL+ program and let /g^t = {U, *sat j be an interpretation of 
21(7, m such that 2sat(0) = 1, for each (</>, 7 ) G P with 7 > 0. Further, for each 

We say that q depends onp in P if P contains a set of clauses {{ipi, Qi), . . . , {ipk,ak)}, 
with fc > 1, such that p appears in the body of ipi, the head of <pk is q, and the head 
of ipi appears in the body of pi+i, with i G {1 , . . . ,k — 1}. 



4 
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predicate symbol q appearing in P, denote by Pq the set of clauses {(tp, a)} C P 
such that the head of is g or g depends on the head of tp in P; and denote 
by the set of valid clauses of Pq. Then, we say that V satisfies the context 
constraint if, for each predicate q appearing in P, it holds that 
Cl: for each clause of the form (p{E) — >• q{F),S) G P+, either 5 < ||p(P)|lpp or, 
for some v, E{v) = 0 and Dp{v) = 1 — \\p{E)\\p^] and 
C 2 : for each clause of the form (q{A) — >• t{B),(3) G P\Pq, for each u either 
A{u) < a or Dq{u) < max(l — P,a), with a = P(*sat(^))- 

Then it can be proved that Cl ensures the degree of deduction obtained from 
P+ by applying the SU and MP rules to be exactly the degree possibilistic 
entailment, and C2 ensures ||g||p = ||g||p, for any q. 

The most important feature of the context constraint is that, in each deduc- 
tion step, it can be checked from the previously computed information. How- 
ever, the bad news is that the constraint C2 is actually stronger than the sec- 
ond requirement listed above. Indeed, if a PGL+ program V = {P,U,m) does 
not satisfy C2 for some predicate q appearing in P, we cannot know whether 
lkl|p = Ikllp,) thus, in that case we should check whether, for all u, 

Dq{u) < sup { min {max(l — 7 , /((/>))} | /= (P, i, m), z(g) = u}, 
(0.7)6P 

which in turn it is equivalent to determine whether Dq{u) < inf{I?('u) | P ^ 
(g(P),l)}. Thus, to strongly ensure the equality ||g||p = ||g||p, is equivalent to 
extend the PGL+ proof method for determining ||g||p\Pg. Hence, our current 
context constraint is a useful approach for ensuring that ||g||p can be computed 
just from the clauses of Pq, and allowing us to define an efficient (as much 
as possible) proof procedure. Moreover, the context constraint can be checked 
for each predicate in an incremental way, and thus, for each goal the proof 
procedure can determine if the computed degree of deduction is in fact the 
degree of possibilistic entailment. 

5 Automated Deduction 

The proof procedure for PGL+ can be divided in three algorithms which have to 
be applied sequentially. A completion algorithm, based on the GR and FU rules, 
which extends a PGL+ program with all valid clauses. A translation algorithm, 
based on the MP, SU, UN and IN rules, which translates a PGL+ program satis- 
fying the modularity constraint into a semantically equivalent set of 1 -weighted 
facts, whenever the program satisfied the context constraint. And, finally, a de- 
duction algorithm, based on the SU rule, which computes the maximum degree 
of possibilistic entailment of a goal from the equivalent set of 1 -weighted facts. 
Next we briefly describe the bases for designing each algorithm. 

Given a PGL+ program V = (P, U,m), the completion algorithm first com- 
putes the set of valid clauses that can be derived from P by applying the GR 
rule (i.e. by chaining clauses). Then, from this new set of valid clauses, the al- 
gorithm computes all valid clauses that can be derived by applying the FU rule 
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(i.e. by fusing clauses). As the FU rule stretches the body of rules and the GR 
rule modifies the body or the head of rules, the chaining and fusion steps have 
to be performed while new valid clauses are derived. As the chaining and fusion 
steps cannot produce infinite loops and each valid clause of P is either a basic 
clause or can be derived at least from two clauses of P, in the worst-case each 
combination of clauses of P derives a different valid clause. Hence, as P is a finite 
set of clauses, denoting by N the number of clauses of P, in the worst-case the 
number of valid clauses is iV -|- Ci) ^ ^( ^n/2 )• However, in general, only 

a reduced set of clauses can be combined to derive new valid clauses. Indeed, 
Cl, C2 and C3 can derive a new valid clause if ci and C2, ci and C3, or C2 and C3 
derive a valid clause different to ci, C2 and C3. 

The algorithm for translating a PGL+ program P = (P, U, m) into a set of 
1 -weighted facts is based on the following result: ||9(C')||p = ||<7(C')||{(q(p)^)^i)}, 
where Dg{u) = mf{D{u) \ P ^ (^(-D),!)}. Then, as ||g(G)||p = ||g(C')||p+, 
if {P^,U,m) satisfies the context constraint, the Dq can be determined just 
from P+ (i.e. the clauses of P+ whose heads are q or q depends on their 
heads), and each rule in P+ can be be replaced by a fact applying the SU 
and MP rules: each clause (pi A . . . A — >■ q, a) £ P+ can be replaced by 
(q, min(o;, minj=i_,,,,„ IIPiIIp))- At this point, P>q can be computed from this finite 
set of facts by applying the UN and IN rules. As we only consider non-recursive 
programs, the above mechanism can be recursively applied for determining ||p||p 
for each predicate p such that q depends on p in P, and thus, the time complex- 
ity of the translation algorithm is linear in the total number of occurrences of 
predicates symbols in (P)~'~. 

Finally, if (q(Pg),l) is the 1-weighted fact computed by the translation al- 
gorithm for q, we have that |<7(C')|p = JV(C | P>g), and thus, after applying 
the completion and translation algorithms to a FGh"*" program, \q(C)\p can be 
computed in a constant time complexity in the sense that it is equivalent to 
compute the partial matching between two fuzzy constants. Moreover, if the 
program satisfies the context constraint, we can ensure that |g(C')|p = ||(7(C')||p. 

One of the most important features of PGL+ is that when extending a pro- 
gram with new facts only the set of 1-weighted facts must be computed again, 
and thus, the set of hidden clauses of a program, which from a computational 
point of view is the hard counterpart of dealing with fuzzy constants, must be 
computed again only if new rules are added to the knowledge base. 

6 Conclusions 

In the present paper we have completed the definition, already started in two 
previous works [1,2], of FGh"*", a possibilistic logic programming language with 
fuzzy constants and a fuzzy unification mechanism. Namely, we have identified 
and formalized a class of well-behaved programs and have provided this class 
of programs with a chaining (resolution) and fusion mechanism that, together 
with the fuzzy unification mechanism, has allowed us to design an efficient and 
complete proof procedure oriented to goals. In our opinion, this is a key feature 
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that justifies by itself the interest of such a logic programming system. Future 

work will address the issue of checking the satisfiability of programs. 
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Abstract. In this paper, we study indiscernibility relations and com- 
plementarity relations in information systems. The first-order charac- 
terization of indiscernibility and complementarity is obtained through a 
duality result between information systems and certain structures of re- 
lational type characterized by first-order conditions. The modal analysis 
of indiscernibility and complementarity is performed through a modal 
logic which modalities correspond to indiscernibility relations and com- 
plementarity relations in information systems. 



1 Introduction 

Information systems are knowledge-based systems which describe properties of 
objects in terms of attributes. They provide an effective and broadly applicable 
framework for the management and the processing of uncertainty, a crucial issue 
in the development of reasoning systems that are concerned with incomplete in- 
formation. The increasing number of knowledge-based systems that manage and 
process incomplete information leads us to develop formal methods for reason- 
ing about uncertain knowledge discovered from information systems. Initiated 
by Pawlak [8] and furthered by Demri [2], Demri, Orlowska and Vakarelov [3], 
Orlowska [5,6], Orlowska and Pawlak [7] and Vakarelov [9,10,11,12], the theo- 
retical foundations of information systems investigate the relationships between 
objects determined by their properties. All the relations defined in this context 
are either indistinguishability relations or distinguishability relations. Indistin- 
guishability relations indicate the way objects share properties whereas distin- 
guishability relations indicate the way properties differentiate objects. Typical 
issues are the following: first-order characterization and modal analysis of var- 
ious classes of indistinguishability relations and distinguishability relations. To 
obtain the first-order characterization of a class of indistinguishability relations 
and distinguishability relations, one has to find first-order conditions such that 
relations satisfying these conditions correspond to the indistinguishability rela- 
tions and the distinguishability relations of this class derived from information 
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systems. To perform the modal analysis of a class of indistinguishability relations 
and distinguishability relations, one has to address the questions of axiomati- 
zation/completeness and decidability/complexity of a modal logic which modal- 
ities correspond to the indistinguishability relations and the distinguishability 
relations of this class. In this paper, extending the line of reasoning suggested 
by Demri, Orlowska and Vakarelov [3], we study indiscernibility relations and 
complementarity relations in information systems. The first-order characteriza- 
tion of indiscernibility and complementarity is obtained through a duality result 
between certain structures of relational type characterized by first-order condi- 
tions and information systems. The modal analysis of indiscernibility and com- 
plementarity is performed through a modal logic which modalities correspond to 
indiscernibility relations and complementarity relations in information systems. 



2 Indiscernibility and Complementarity 

Adapted from Pawlak [8], an information system will be any structure (Att, 
Obj,{Vala I a G Att},f) where: 

— Att is a nonempty set of attributes; 

— Obj is a nonempty set of objects; 

— For all a G Att, Vala is a nonempty subset of a fixed nonempty set V al of 
properties; 

— / is a function with domain Att x Obj and range the power set of V al such 
that for all a G Att and for all x G Obj, f{a, x) C Vala- 

We should consider, for example, the information system S = {Att, Obj, {Vala \ 
a G Att},f) defined as follows. Define: 

— Att is {Languages, Sports}; 

— Obj is {Ann, Bob, Cindy , Daniel, Emma}; 

— Val Languages IS {Arabic, Bulgarian, C asUHan, Dutch}; 

— Valsports is {athletics, basketball, cycling}; 

— f is the function defined by table 1. 

In this information system, the object Bob possesses the properties Arabic and 
Bulgarian of mastering Arabic and Bulgarian whereas the object Daniel pos- 
sesses the properties athletics and cycling of practising in athletics and cycling. 
Information systems constitute the starting point for the formal examination of 
sentences of the form “object x is indistinguishable from object y” or sentences 
of the form “object x is distinguishable from object y”. In this respect, indis- 
cernibility relations and complementarity relations play an important role. Let 
S = {Att, Obj, {Vala I a G Att}, /) be an information system. For all x,y G Obj, 
define: 

Strong indiscernibility: x =s y iff for all a G Att, f{a,x) = f{a,y); 

Strong complementarity: xRsy iff for all a G Att, f{a, x) = {Vala \ f{a, y)). 
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Table 1. Example of an information system. 



f 


Ann 


Bob 


Cindy 


Daniel 


Emma 


Languages 


{Arabic, 

Bulgarian} 


[Arabic, 

Bulgarian} 


[Castilian, 

Dutch} 


{Arabic, 

Bulgarian} 


[Arabic, 

Castilian} 


Sports 


[athletics, 

basketball} 


[athletics, 

basketball} 


[cycling} 


[athletics, 

cycling} 


[cycling} 



Intuitively, two objects are strongly indiscernible if all their respective sets of 
properties determined by the attributes are indiscernible whereas two objects 
are strongly complementary if all their respective sets of properties determined 
by the attributes are complementary. The information system of table 1 is such 
that Ann =s Bob and AnnRsCindy. For all x,y € Obj, define: 

Weak indiscernibility: x =s y iff there is a G Att such that /(a, x) = f{a, y); 
Weak complementarity: xpsy iff there is a G Att such that /(a, x) = {V ala \ 
f{a,y)). 



Intuitively, two objects are weakly indiscernible if some of their respective sets of 
properties determined by the attributes are indiscernible whereas two objects are 
weakly complementary if some of their respective sets of properties determined 
by the attributes are complementary. The information system of table 1 is such 
that Ann =s Daniel and AnnpsEmma. The structure {Obj, =s, Rs, —S, Ps) is 
called abstract structure derived from S. We leave it to the reader to prove the 
following lemmas. 



Lemma 1. For all x,y,z G Obj: 

X =s x; 

If x=sy then y =s x; 

If X =s y and y =s z then x =s z; 
If X =s y and yRsz then xRsz; 

Lemma 2. For all x,y,z G Obj: 
x =s x; 

If x=sy then y =s x; 

If X =s y and y =s z then x =$ z; 
If X =s y and yRsz then xpgz; 



xRsx; 

If xRsy then yRsx; 

If xRsy and y =s z then xRsz; 
If xRsy and yRsz then x =s z. 



xpsx; 

If xpsy then ypsx; 

If xpsy and y =s z then xpsz; 
If xpsy and yRsz then x =s z. 



Lemma 1 and lemma 2 motivate the following definition. An abstract structure 
is a structure {W,=,R,=,p) where: 

— IF is a nonempty set of possible worlds; 

— = and R are binary relations on W subject to the conditions of lemma 1; 

— = and p are binary relations on IF subject to the conditions of lemma 2. 
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In section 3, the concept of abstract structure will be of use to us for the purpose 
of giving a first-order characterization of indiscernibility relations and comple- 
mentarity relations in information systems. In section 4, the concept of abstract 
structure will be of use to us for the purpose of giving a modal analysis of 
indiscernibility relations and complementarity relations in information systems. 

3 First-Order Characterization 

The concept of abstract structure is of use to us for the purpose of giving a first- 
order characterization of indiscernibility relations and complementarity relations 
in information systems. The following important theorem explains the connec- 
tion between abstract structures and information systems. All the section 3 is 
devoted to its proof. 

Theorem 3. Let F = {W,=,R,=, p) he an abstract structure. There is an in- 
formation system S = {Att,Obj, {Vala \ a € Att}, f) such that Obj = W and 
for all x,y € Obj: 

X =s y iff X = y; x =s y iff x = y; 

xRsy iffxRy; xpsy iffxpy. 

As a consequence, abstract structures and information systems have equal math- 
ematical content as far as indiscernibility relations and complementarity relations 
are concerned. Holding the proof of theorem 3 in abeyance for a while, we pro- 
ceed to introduce the concepts of indiscernibility set, positive set, negative set 
and good set. Two subsets A and B of W are called comparable if A C i? or 
B C A. A set of pairwise comparable subsets of W is called a chain. A subset A 
of W such that for all x,y € W: 

— li X = y and x € A then y G A; 

— li X = y and x ^ A then y ^ A; 

will be defined to be an indiscernibility set. An indiscernibility set A such that 
for all x,y G W: 

— If xRy and x G A then y ^ A; 

will be defined to be a positive set. An indiscernibility set A such that for all 
x,y GW: 

— If xRy and x ^ A then y G A; 

will be defined to be a negative set. An indiscernibility set A such that A is a 
positive set and A is a negative set will be defined to be a good set. The proof 
of the following lemmas is left as an exercise for the reader. 

Lemma 4. — 0 and W are indiscernibility sets. 

— For all X G W, = (x) is an indiscernibility set. 

— For all x,y G W, = (a;)U = {y) is an indiscernibility set. 
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— For all indiscernihility sets A, {W \ A) is an indiscernihility set. 

— For all families (Ai \ i € I) of indiscernihility sets, \J{Ai \ i ^ I) is an 

indiscernihility set and C\{Ai \ i € I) is an indiscernihility set. 

Lemma 5. — % is a positive set. 

— For all X € W, = (x) is a positive set. 

— For all x,y G W, if xRy then = (a;)U = (y) is a positive set. 

— For all positive sets A, {W \ A) is a negative set. 

— For all chains {Ai | i G /) of positive sets, \j{Ai | z G /) is a positive set. 

Lemma 6. — W is a negative set. 

— For all X G W, ^ (x) is a negative set. 

— For all x,y G W, if xRy then ^ (a:)n ^ {y) is a negative set. 

— For all negative sets A, {W \ A) is a positive set. 

— For all chains {Ai | z G /) of negative sets, f]{Ai | z G /) is a negative set. 

Lemma 7. Let A he a positive set and x € W he such that AU = (x) is not a 
positive set. Then there is y € W such that y € A and xRy. 

Lemma 8. Let A he a negative set and x € W he such that An ^ (a;) is not a 
negative set. Then there is y € W such that y ^ A and xRy. 

A more important further consequence is the following lemma. 

Lemma 9. Let A he a positive set, B he a negative set and x G W he such that 
A C B. Then (A\J = {x) is a positive set and AU = (x) Q B) or (BC\ ^ (x) is a 
negative set and A C BD ^ (x) ). 

Proof. See Balbiani and Vakarelov [1] for details. 

An important related result is the following proposition. 

Proposition 10. Let A he a positive set and B he a negative set such that 
A Q B. Then there is a good set C such that A Q C and C fl B. 

Proof. See Balbiani and Vakarelov [1] for details. 

A set a of good sets such that for all x, y G W: 

— If X ^ y then there is A G a such that x G A iff y ^ A; 

— If xpy then there is A G a such that x G A iff y G A; 

will be defined to be a nice set. The following lemma is easy to check. 

Lemma 11. The set of all good sets is a nice set. 

Proof. See Balbiani and Vakarelov [1] for details. 

A less obvious result is the following lemma. 
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Lemma 12. For all x,y G W: 

— X = y iff for all nice sets a and for all A G a, x G A iff y G A; 

— xRy iff for all nice sets a and for all A G a, x G A iff y ^ A; 

— X = y iff there is a nice set a such that for all A G a, x G A iff y G A; 

— xpy iff there is a nice set a such that for all A G a, x G A iff y ^ A. 

Proof. See Balbiani and Vakarelov [1] for details. 

Referring to lemma 12, we easily obtain a proof of theorem 3. Let S = {Att, Obj, 
{Vala I a G Att}, /) be the information system defined as follows. Define: 

— Att is the set of all nice sets; 

— Obj is the set of all possible worlds; 

— For all a G Att, Vala is the set of all good sets A such that A G a; 

— For all a € Att and for all x € Obj, f{a, x) is the set of all good sets A such 
that Ago and x G A. 

The reader may easily verify that for all x,y G Obj: 

X =s y iS X = y; x =s y iS x = y; 

xRsy iff xRy, xpsy iff xpy. 

4 Modal Analysis 

The concept of abstract structure is of use to us for the purpose of giving a modal 
analysis of indiscernibility relations and complementarity relations in informa- 
tion systems. The reader is assumed to be familiar with the general concepts of 
modal logic, see Hughes and Cresswell [4] for details. Seeing that the condition 
xRx of lemma 1 and the condition xpx of lemma 2 are not modally definable, we 
need to introduce the concept of nonstandard abstract structure. A nonstandard 
abstract structure is a structure {W,=,R,=,p) where: 

— TT is a nonempty set of possible worlds; 

— = and R are binary relations on W subject to the conditions of lemma 1 but 
the condition xRx\ 

— = and p are binary relations on W subject to the conditions of lemma 2 but 
the condition xpx. 

The linguistic basis of our modal logic is the propositional calculus enlarged with 
the modalities [=] , [i?] , [=] and [p] corresponding to the indiscernibility relations 
and the complementarity relations in information systems. We define the set of 
all formulas as follows: 

— A::=p\^A \ {Ay B)\ [=]A | [R]A \ [-]A | [p]A- 

where p ranges over a countably infinite set of propositional variables. The other 
standard connectives are defined by the usual abbreviations. In particular, (=)A 
is -i[=]-iA, {R)A is (=)A is -i[=]-iA and {p)A is -^[p]-^A. We follow 

the standard rules for omission of the parentheses. A model (respectively: a 
nonstandard model) is a structure {W, =, R, =, p, V) where: 
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- {W, =, i?, =, p) is an abstract structure (respectively: a nonstandard abstract 
structure); 

- is a function with domain the set of all propositional variables and range 
the power set of W. 

Let M = {W, =, R, =, p, V) be either a model or a nonstandard model. We define 
the relation “formula A is true at possible world x in M” , denoted M, x \= A, 
as follows: 

- M ,x \= p X € V{p)] 

- M, X \= —'A iff M, X A] 

- M, a; 1= ^ V B iff M, a; |= ^ or M, x \= B; 

- M,x \= [=] A iff for all y G W, if x = y then M,y \= A; 

- M,x \= [R]A iff for all y G W, if xRy then M,y \= A; 

- M,x \= 1=] A iff for all y G W, if x = y then M,y \= A; 

- M,x \= [p]A iff for all y G W, if xpy then M,y \= A. 

An alternative formulation is “M satisfies formula A at possible world a;”. The 
following lemma is basic. 

Lemma 13. The following conditions are equivalent. 

1. A is true at some possible world in some finite model; 

2. A is true at some possible world in some model; 

3. A is true at some possible world in some nonstandard model; 

4- A is true at some possible world in some finite nonstandard model. 

Proof. (1 implies 2): Obvious. 

(2 implies 3): Obvious. 

(3 implies 4): Let M = {W,=, R,=, p,V) be a nonstandard model and M' = 
{W' , R' , = , p' ,V) be the finite nonstandard model defined as follows. Let 

Pa be the smallest set of formulas containing the set Sf{A) of all subformulas 
of A and such that for all formulas B, if [=]B G Pa or [R]B G Pa or [=]B G Pa 
or [p]B G Pa then [=]B G Pa and [R]B G Pa and [=]B G Pa and [p]B G Pa- 
It should be remarked that Card{PA) < 4 x Card{Sf{A)). Let =Ta be the 
equivalence relation on W defined as follows. For all x,y G W, define: 

- X =Ta y iff for all formulas B, if B G Pa then M, x\= B if! M, y \= B. 

For all X G W, the equivalence class of x modulo =Ta is denoted | x |. The 
quotient set of W modulo =Ta is denoted by . Define: 

- TA' is 

- For all x,y G W, | x |='| y | iff for all formulas B, if [=]B G Pa then: 

If M, X ^ [=]B then M,y \= [=]B; If M, x ^ [—]B then M,y \= [=]B; 

If M,y \= [=]B then M,x \= [=]B; If M,y \= [=]B then M,x \= [=]B; 

If M,x \= [R]B then M,y \= [R]B; If M,x \= [p]B then M,y )= [p]B; 

If M,y \= [R]B then M,x \= [R]B; If M,y \= [p]B then M,x \= [p]B; 
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For 


all 


x,y 


G W, 


\x\R' \ y 


1 iff for all formulas 


B, if [R\B G La then: 


If 


M, 


X h 


¥\B 


then M, y 


h [R]B; 


If M, a; ^ 


p]B then M,y\= [p]B; 


If 


M, 


y h 


¥\B 


then M, x 


h {R]B; 


If M,y h 


MB then M,x ^ [p]B; 


If 


M, 


a; h 


[R\B 


then M, y 


h MB; 


If M,x h 


[p]B then M,y^ p.]B; 


If 


M, 


y h 


[R]B 


then M, x 


h MB; 


If M,y h 


[p]B then M,x ^ [—]B; 


For 


all 


x,y 


G W, 


\^\='\y\ 


iff for all 


formulas B, if [=]B G La then: 


If 


M, 


X h 


[=]S 


then M, y 


h MB; 


If M, a; ^ 


[p]B then M,y |= [R]B; 


If 


M, 


y h 


[=]B 


then M, x 


h MB; 


If M,y h 


[p]B then M,x \= [R]B; 


For 


all 


x,y 


G W, 


\x \ p' \y 


1 iff for all formulas . 


B, if [p]B G Pa then: 


If 


M, 


X h 


[=]S 


then M, y 


h [R]B; 


If M, a; ^ 


[p]B then M,y \= [=]B; 


If 


M, 


y h 


[=]B 


then M, x 


h [R]B; 


If M,y h 


[p]B then M,x ^ [=]B; 


For 


all 


propositional variables p, V'{p) is V{p)\^ 


■Ca' 



It follows immediately that M' is a filtration of M . As a consequence, if A is 
true at some possible world in M then A is true at some possible world in M' . 
(4 implies 1): Let M = {W,=, R,=, p,V) be a finite nonstandard model and 
M' = {W', =', R', =', p', V') be the finite model defined as follows. Define: 

— W' is W X {0,1}; 

— For all x,y GW and for all i,j G {0, 1}, (x,i) =' (j/jj) iS x = y and i = j; 

— For all x,y GW and for all i,j G (0, 1}, (x, i)R'{y,j) iff xRy and i = 1 — j; 

— For all x,y G W and for all i,j G (0, 1}, (x, i) =' {y,j) iS x = y and i = j; 

— For all x,y G W and for all i,j G (0, 1}, (x, i)p'{y,j) iff xpy and i = 1 — j; 

— For all propositional variables p, V'{p) is V{p) x (0, 1}. 

It follows immediately that M is a p-morphic image of M' . As a consequence, if 
A is true at some possible world in M then A is true at some possible world in 
M'. □ 

Now we turn to the axiomatization of the set of all formulas true at all possible 
worlds in all models. Let LSWIC — logic of strong and weak indiscernibility 
and complementarity — be the smallest normal modal logic that contains the 
axioms of table 2. A typical result is the following. 

Theorem 14. LSWIC is complete with respect to the class of all models and 
the class of all nonstandard models, i.e. the following conditions are equivalent. 

1. A is true at all possible worlds in all models; 

2. A is true at all possible worlds in all nonstandard models; 

3. A is a theorem of LSWIC . 

Proof. (1 implies 2): By lemma 13. 

(2 implies 1): By lemma 13. 

(2 implies 3): The proof can be obtained by the canonical model construction. 
(3 implies 2): The proof is trivial because nonstandard models satisfy the con- 
ditions which are needed to verify the axioms of LSWIC . □ 
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Table 2. Axioms of LSWIC. 



NA ^ A 
A^M(^)A 

[i?]A — >• i=][i?]A 
A [R] (B)A 
[R\A — >• [i?][=]A 
[=]A — >• [i?][i?]A 



[^]A ^ A 
A ^ [-](-) A 
[-]A^[-1NA 
[p\A — >• [=][_R]A 
-4 [p](p)A 

[p]A [p][=]A 
[— ]A ^ [p][i?]A 



We now turn our attention to the decidability of the problem of determining of 
any given formula whether it is a theorem of LSWIC or not. 

Theorem 15. Determining of any given formula whether it is a theorem of 
LSWIC or not is decidable. 

Proof. By lemma 13 and theorem 14, LSWIC is a finitely axiomatizable normal 
modal logic which has the finite model property. As a consequence, determining 
of any given formula whether it is a theorem of LSWIC or not is decidable. □ 

5 Conclusion 

We have addressed the issues of first-order characterization and modal anal- 
ysis of indiscernibility and complementarity in information systems. Previous 
first-order characterizations and modal analyses have been given by Demri [2], 
Demri, Orlowska and Vakarelov [3], Orlowska [5,6], Orlowska and Pawlak [7] 
and Vakarelov [9,10,11,12] who consider indistinguishability relations and dis- 
tinguishability relations like the similarity relations defined as follows. Let 
S = {Att, Obj, {Vala I a £ Att}, /) be an information system. For all x,y £ Obj, 
define: 

Strong positive similarity: xasy iff for all a G Att, f{a, x) fl /(a, y) 0; 
Strong negative similarity: xvsy iff for all a G Att, {Vala \ f {a, x)) (1 {Vala\ 
f{a,y)) yf 0; 

Weak positive similarity: xSsy iff there is a G Att such that /(a, x)Df{a, y) 

yf 0; 

Weak negative similarity: xNsy iff there is a G Att such that {Vala\f{a, x)) 
n(Vala \ f{a,y)) ^ 0; 

It should be remarked that the strong complementarity relation is definable by 
means of the strong similarity relations as follows: 

- Rs = a^rU7^; 

whereas the weak complementarity relation is definable neither by means of 
the strong similarity relations nor by means of the weak similarity relations. 
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First-order characterizations and modal analyses of indiscernibility relations and 
complementarity relations in information systems together with other indistin- 
guishability relations or distinguishability relations like similarity relations are 
not known. 
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Abstract. We present a generalization of logic All I Know by presenting 
it as an extension of standard modal logics. We study how this logic 
can be used to represent complete and incomplete knowledge in Logical 
Information Systems. In these information systems, a knowledge base 
is a collection of objects (e.g., files, bibliographical items) described in 
the same logic as used for expressing queries. We show that usual All I 
Know (transitive and euclidean accessibility relation) is convenient for 
representing complete knowledge, but not for incomplete knowledge. For 
this, we use serial All I Know (serial accessibility relation). 



1 Introduction 

Most common paradigms of information systems are hierarchical systems (e.g., 
File Systems), relational databases, and deductive databases. While the first 
paradigm is based on navigation in a hierarchy built by hand, the other ones are 
based on a querying language (e.g., SQL, first-order logic). However, it appears 
in practice that both navigation and querying are needed in information retrieval 
[GMA93], which none of the three above paradigms offers simultaneously. A new 
paradigm of information system was proposed [GMA93] for tightly combining 
navigation and querying, which is based on Goncept Analysis [Wil82] . Recently, 
we presented a logical generalization of this new paradigm, that we call Logical 
Information System (LIS) [FROO], in which an almost arbitrary logic can be used 
to describe individual objects. 

As for deductive databases, a LIS knowledge base is expressed in a logical way. 
A first difference is that this knowledge base is composed of objects (e.g., files, 
bibliographic items, web pages) described by formulas, rather than composed of 
relations. It is called a logical context by reference to Goncept Analysis that serves 
as a framework for LIS. On this point, a logical context is similar to an ABox 
in description logics [DJ94,DNR97], except the logic used is almost arbitrary. A 
second difference is that the answers are formulas expressed in the same logic as 
queries, and not set of objects or values as it is usually the case. This enables a 
“dialogue” between a LIS and a user because they use the same logical language. 
This dialogue acts as a logical, automatic, and relevant navigation, that helps 
the user in the information retrieval process. Moreover, this navigation makes it 

* This author is supported by a scholarship from CNRS and Region Bretagne 
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possible to start with no knowledge of the contents of a LIS, neither the objects, 
nor the logic in use. It is the logical answers that informs gradually the user on 
both the logic and the objects. 

We realized a prototype and made some experiments. They rapidly showed 
that deduction capabilities in propositional logic were not fully satisfying. For 
instance, in a bibliographic application, the query -'Jones releases no an- 
swer because bibliographic items are described by the authors they have (e.g., 
Smith A Bond), and not by the authors they do not have: i.e., negative facts 
are not explicitly represented. Thus, following Levesque and Reiter [Rei92], we 
argue that the logic must be equipped with epistemic features to enhance the 
expressiveness of queries. This would allow to describe what is known about the 
external world, and to query what is known by the information system. 

This paper aims at showing how complete and incomplete knowledge 
can be represented in a LIS. Section 2 shows that Levesque’s logic All I 
Know (noted OAfC) is a well suited formalism among non-monotonic ones for 
representing complete knowledge. Section 3 explains why OMC has to be modi- 
fied for representing incomplete knowledge. The modification consists in replac- 
ing the transitive and euclidean accessibility relation by a serial one. Both these 
sections also present some idiomatics that characterize new notions of truth and 
make the use of OAfC easier for LIS end users that are unaware about modal 
logic. Finally, Section 4 concludes the paper and draws some perspectives. 

2 Expressing Complete Knowledge 

When we have a complete knowledge about objects, we want to deduce neg- 
ative facts about objects, without having to assert them in object descrip- 
tions. For instance, we do not want to mention the fact that an object sat- 
isfies -'Jones, while we want that this fact be deducible from its description 
because we have a complete knowledge on this object. This is obviously a for- 
mulation of the well-known Closed World Assumption (CWA), which led to 
many formalisms for non-monotonic reasoning (e.g., Minimal Belief and Nega- 
tion as Failure [Lif91,DNR97], Auto-Epistemic Logic [Moo85], Circumscrip- 
tion [McC86], All I Know [Lev90]). However, logics used in LIS must have a 
monotonic deduction relation because of the framework on which it is based, 
i.e., Concept Analysis. In fact, this framework requires that the logic has a de- 
duction relation that forms a lattice. In other words, we need to apply CWA 
locally in formulas (especially in object descriptions) rather than globally in 
the deduction relation. Levesque’s logic All I Know (noted OAfC) is precisely a 
formalism that defines such an operation. Moreover, logic OAfC is proved to en- 
compass all these non-monotonic formalisms (see [Che94] for mappings between 
these formalisms), and there exists a proof method for it [RosOO]. 

2.1 Logic OAfC 

In this section, we recall the formalization of logic OAfC (here, we consider 
its semantics [RosOO], but there also exists an axiomatics [Lev90]). The logical 
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language ON C is defined as a propositional language with connectives A, -i (V 
and are defined as abbrevations), whose atomic propositions belong to an infi- 
nite set A, and that is extended with modal operators K, N, O. Logic ON C can 
be given a Kripke semantics: the worlds are valuations of A in {TRUE, FALSE} 
extended in the usual way to propositional formulas, and the accessibility relation 
is defined as a relation between these worlds. 

Definition 1 Let w be a world, and let R be a transitive^ and euclidean^ 
accessibility relation. We say that a structure {w, R) is a model of a 
formula <f> G ONC, and we note (w, R) ^ 4>, iff the following condi- 
tions hold (R{w) denotes the set of successor worlds of w through R): 

1. if (j> & A, then (w,R) \= 4> iff w{(j)) = TRUE; 

2. if 4> = then (w, R) f iff (w, R) ^ N; 

3. if <f) = 4>i A (j> 2 , then {w, R) \= 4> ijf (w, R) |= and {w, R) |= (f> 2 ; 

4- if <f> = K(f>i, then {w,R) \= f iff for every w' € R{w), (w',R) \= (j>i; 

5. if 4> = Nfi, then lw,Rf \= f iff for every w' ^ R{w), {w',R) \= fi; 

6. if 4> = Ofi, then {w, Lf) \= f iff for every w' , w' G R{w) iff {w', R) \= fi- 

We remark here that for every worlds w, w' such that wRw', we 
have R{w) = R{w') because the accessibility relation is both transitive and eu- 
clidean. This means for a structure {w, R) that every world accessible from w (in 
zero, one or more steps) has the same set of successor worlds R{w). Therefore, 
we can use instead a structure {w,W) where W = R{w) [Lev90,Ros00]. We do 
not do so because this statement is not right in all the rest of this paper. 

Logic ONC is equipped with a monotonic deduction relation \=oj\fc^ that 
enables to compare object descriptions with queries, but also queries themselves. 



Definition 2 A formula f G ONC entails a formula G ONC (denoted 
as 4> \=oMC f’ ) iff 4> ^ f is ON C-valid, i.e., for every Kripke structure (w,R) 
where R is transitive and euclidean, {w, R) \= f ^ f!. 

In order to better understand modal operators, we prove the following lemma. 

Lemma 1 If <p is a ON C- formula and Wn{(j)) = {wKw, i?) \= <p} 

is the set of worlds where <f> is true, then for every structure (w, R) 

1. (w,R) \= K(j) iff R{w) C Wr{4)); 

2. (w,R) \= ijfR(w) D WrU); 

3. {w,R)^Of iffR{w) = WR{f). 

Proof 1 Proofs for each item is directly obtained from Definition 1, and are 
similar. So, we detail the proof only for modality K. 

(1) {w, R) \= K(j> 4=^ Vw' G R{w) : {w' , R) \= <f) 

4=^ Vw' : w' G R{w) w' G Wr{4>) 4=^ R{w) C Wr{4>). ■ 



^ A relation R is transitive iffVui,w',ic” : wRw' and w' Rw" implies wRw”. 
^ A relation R is euclidean iSyw,w' ,w" : wRw' and wRw” implies w'Rw”. 
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This lemma shows that in a model {w, R) of a modal formula, what is 
important is not the initial world w, neither the accessibility relation itself, 
but the set of successor worlds R{w). Therefore, these modal formulas M(j) 
{M G {iC, iV-i, O}) describe sets of models of (j), rather than individual mod- 
els of 4>. For instance, modal formula K(f> describes some subsets of in 

which (j) is always true but not only <f). So, K(f> can be read as “at least (/)”. Dualy, 
modal formula N~>(1> can be read as “at most </>” , and modal formula 0<j), which 
is semantically equivalent to K(j)AN-'(f> according to Definition 1, can be read as 
“exactly or “all I know is (hence, the original name of logic OAfC [Lev90]). 

2.2 Completing Knowledge with Logic OJ\f C 

We now consider the representation of complete knowledge with logic OAfC. For 
instance, in the context of a bibliography, we want to represent the authors of a 
document o. If A and B are authors (represented as independant atoms), we can 
logically describe this document with the propositional formula d{o) = A A B. 
Then, o is an answer of the query q = A (because d{o) entails q), but is not 
an answer of the query q' = ->C (because d{o) does not entail q'). But, if the 
knowledge expressed in d{o) is complete, we want to deduce from it that C is 
not an author of the considered document. 

For this, we propose to complete object decriptions by embedding them in 
modal operator O (this idea has already been proposed, e.g., in the conclusion 
of [Rei92]). In our example, we establish the following entailments (by the mean 
of a tableau calculus [RosOO]) 

0{A A B) \=oMc K{A), 0{A A B) \=oMc “'^(C')> 

which can be translated in English as “if A and B are authors and the only ones, 
then A is an author and C is not”. 

2.3 Idiomatics for Complete Knowledge 

Following Reiter [Rei92], we think that modal operators are not convenient for 
naive end users, which tend to reason on non-modal formulas. For this, we intro- 
duce some idiomatics for expressing descriptions and queries in an easier way: 

description: [d] = Od, where d is a non-modal formula describing an object: 
ex., [AA-'B]; 

query: proposition whose atoms are either +q = Kq or —q = ~'Kq, q being a 
non-modal formula: ex., (-I-(A A R) V +^B) A —C. 

These idiomatics are not arbitrary and characterize the deducibility of some 
properties in propositional logic, as proved by the following lemma. 

Lemma 2 Let \=c he a deduction relation on non-modal formulas (proposi- 
tions). For every non-modal formulas d,q 

c M] \=oATc +q iff d [=£ q; 

2- [d] \=oATC -q iff d q- 




786 



S. Ferre 



Proof 2 

(1) [d] \=oMc +9 {Od =J> Kq) is valid (Definition 2) 

y{w,R) : (w,R) \= Od) =J> (w,R) \= Kq) (Definition 1) 
y{w,R) : R{w) = W}i{d) R{w) C WR{q) (Lemma 1) 

Vi? : Wii{d) C Wfi{q) d q (d,q are non-modal formulas) . 

(2) similar to proof (1). ■ 



If a non-modal formula d represents the knowledge we have about an object, 
then [d\ \=oNc +9 means “one knows g” because this is equivalent to d q 
(Lemma 2). Conversely, [d] \=oMC ~Q means “one does not know g”. Now, if we 
use [d] to represent a complete knowledge, i.e., everything unknown is considered 
as false, we must read -kg as “g is trwe”, and — g as “g is false'\ Here, truth and 
falsity are expressed from a knowledge point of view, whereas from a real world 
point of view, they would be expressed by g and ~<q. 

Lemma 2 shows how a subset of logic OJ\f C can be used to represent complete 
knowledge. While this subset is simple, it enables some fine distinctions. First, 
-kgi V -kg 2 \=oMc +{qi V q 2 ) while the converse is false ([gi V g 2 ] is a counter- 
example): -kgi V-kg 2 represents determination (at least gi or g 2 is known as true), 
whereas +(giVg 2 ) represents some indetermination in knowledge (giVg 2 is known 
as true, but which part is true can be unknown). Second, +-^q \=oNc ~Q while 
the converse is false ([d] is a counter-example if d g): -k“'g represents explicit 
falsity (g is known as false), whereas — g represents absence of truth (g is not 
known as true, but it is not necessary known as false either). 

In the following section, we show that even finer knowledge distinctions can 
be made by means of logic OAfC, e.g., taking into account incomplete knowledge. 

3 Expressing Incomplete Knowledge 

From Lemma 2, it follows that for every non-modal description d, and every non- 
modal query g, [d] = Od always entails either -kg = Kq or — g = -•Kq, i.e., we 
have a complete knowledge with descriptions embedded by modal operator O. 
We recall that for every formula d G OJ\f C, Od can be defined as KdAN-'d, which 
can be read as “at least d and at most d” . Each part of this definition expresses 
an incomplete knowledge, and this is the conjunction of the application of both 
parts to a same formula that forms a complete knowledge. The issue of this 
section is to find how expressing incomplete knowledge by using modalities K 
and TV in a less tight combination than in the definition of modality O. For this, 
we consider different formulas for the at least and at most parts, where the later 
must entails the former for consistency reasons (see Lemma 1). So, incomplete 
descriptions are in the form Kd A N~'{d Ad'), which we note [d, d']. 

3.1 Examples and Problems 

As in Section 2, we consider some examples about the representation of authors 
in a bibliographic application. Authors are simply represented by independant 
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atoms {A,B,C,D, ...). We present some modal formulas with an expected 
meaning: 

1. d{oi) = [A/\B,1] = K{AAB): “at least AAB”, i.e., “A and B are certainly 
authors, but there are possibly other ones”; 

2 . d{ 02 ) = [T,A A B AC] = N^{A A B A C): “at most A A B A C” , i.e., “A, 
B, and C are the only possible authors, but we do not know exactly which 
ones are effectively”; 

3. d(o 3 ) = [A A B,C] = K{A A B) A N->{A ABA C): “A and B are certainly 
authors, C is possibly also, but there are no other one”. 

In order to know if these formulas meet their expected 
meaning, we look at what can be deduced from them 
in OJ\f C. The following table summaries some entailments 
for each example. For instance, it shows that d{o^) en- 
tails +A, —D, and that neither +C, nor —C is deducible. 

Two problems are revealed by these examples. The first one is that half 
of the above table is empty, which corresponds to an extra-logical property 
(d(o 3 ) '^oMc +C and d{oz) V^oMC ~C). This means that we can not ask for 
documents where author C is possible, that is where C is neither true, nor false. 
This notion of possibility is necessary with an incomplete knowledge, which we 
try to represent, but we want to express it in the logic itself. 

The second problem is that ^A is possible in d{oi), whereas A is true. This 
means that a model {w, R) where R{w) = 0 is considered. In this model, object o\ 
is considered as an impossible object. As objects do exist in the real world, we 
want to exclude this possibility, and to deduce that -'A is false like in d(o 2 ). 
These two problems are addressed in the next section. 

3.2 Generalization of Logic OJ\f C 

The first problem revealed in previous section is about expressing a possi- 
ble fact q, that is a fact which is neither true {[d,d'] Y^oMc + 9)5 nor false 
([d, d'] Y=oaTc —<?)) where [d, d'] is an incomplete description. This prob- 
lem is in fact similar to the one of Section 2.2 about complete knowledge, 
and it is tempting to adopt a similar solution, i.e., to embed object descrip- 
tion of Section 3.1 in modal operator O and to embed undeducible facts by 
modality (see Lemma 2). Therefore, the description becomes 0[d,d'] = 
0{Kd A N^{d A d')), and we expect to express that proposition q is possible by: 
0[d, d'] \=oNC ~^K{+q) A ~^K{-q), 

0{Kd A N~'{d A d')) \=ojq'c ~'K{Kq) A —'K{—'Kq). 

Unfortunately, formulas in the form 0[d, d'] are not OAf/l-satisfiable (see Ex- 
ample 2 of Section 2 in [RosOO]). For explaining this, we first need to recall 
that a structure {w, R) can be replaced by a structure {w, W) where W is 
the constant world set R(iu) (see Section 2.1). Now, let {w,W) be a model 
of 0[A A B,A] = OK{A A B). If IT C W(A A B), then for every world w', 
{w' ,W) is a model of K{A A B) as W = R{w') (Definition 1). Then, from se- 
mantics of modality O, every world w' belongs to W since R{w) = W. This 
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is contradictory because A A B is not a tautology. However, W % W{A A B) 
contradicts that {w, W) is a model of OK{A A B). 

In the same way as formula 0{A A B) enables us to reason on all models 
oi A A B, we would like that formula OK {A A B) enables us to reason on all 
models of K{A A B), i.e., on all structure {w, R) such that R{w) C Wr{A A B) 
(see Lemma 1). For this, it is necessary that R{w) does depend on world w, 
in order to keep the meaning of an incomplete knowledge. This is why we pro- 
pose to generalize logic OAf C by removing transitive and euclidean conditions 
on the accessibility relation from Definition 1. Therefore, we can see logic OAf C 
as an ordinary modal logic where K is the main modal operator defined on 
accessible worlds R{w), whereas fV is a dual modal operator defined on un- 
accessible worlds R{w), and O is simply defined as a combination of K and N 
{0(j) = K(j)AN-i<p). Then, a whole family of OAf/l-logic can be derived by apply- 
ing various conditions on the accessibility relation, as it is done for usual modal 
logics [Bow79]. For instance, usual logic OAf C has a transitive and euclidean 
accessibility relation and can so be renamed as K45-OA/”£, whereas our general- 
ization leads to an arbitrary accessibility relation and can be named as K-OAfC. 



Definition 3 Semantics and entailment of logic K-OAfC are defined as in Def- 
initions 1 and 2, except there are no condition on the accessibility relation. 

With logic K-ON C, the knowledge is stratified because the accessibility relation 
is neither transitive, nor euclidean. Object description Od{o\) = OK {A A B) is 
now satisfiable and can be read as three levels of knowledge: a model of 

1 . H A i? is a world w satisfying both A and B ; 

2. K{A A B) is a world w whose R{w) is a set of models of A A B; 

3. OK {A A H) is a world w whose R{w) collects all models of K{A A B). 
This description represents a complete knowledge about an incomplete knowl- 
edge about object o\: “All I know about object oi is that it has at least authors A 
and H”. It allows the following entailment with idiomatics of Section 2.3. 

Proposition 1 0[AAH,T] \=k-oMc 

K{+A) A i^K{+^A) A -^K(-^A)) A hK{+C) A -^K(-C)). 

This means that Od{oi) entails “one knows that A is true” (i.e., is an author), 
and also “one does not know about ~iA and C” . So, the fact that C is a possible 
author is correctly expressed by {-•K{-\-C) A -•K{—C)), which solves the first 
problem presented in Section 3.1. 

On the contrary, the second problem is not solved because -lA is proved pos- 
sible rather than false as expected. The reason is that a Kripke structure (w, R) 
where R{w) = 0 is considered as a model of K {A A B), which means that an im- 
possible object is considered. This is not convenient in our Logical Information 
System where objects do exist in the real world. To exclude these empty models, 
we just add a condition of seriality^ on the accessibility relation [Bow79], which 
forces any world to have at least one successor: we obtain logic KD-OAfC. 



® A relation R is serial iff Vw : 3w' : wRw' . 
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Definition 4 Semantics and entailment of logic KD-OMC are defined as in 
Definitions 1 and 2, except the accessibility relation must only he serial. 

This time, we get the expected entailment: “one knows that ~<A is false”. 

Proposition 2 0[AAB,±] \=kd-oMc 

K{+A) A K{-^A) A hK{+C) A -iC(-C)). 



3.3 Idiomatics for Incomplete Knowledge 

As for complete knowledge, we introduce some idiomatics, by extending idiomat- 
ics of Section 2.3: 

description: [d, d'] = 0[d, d'] = 0{Kd A N~i{d A d')), where d and d' are non- 
modal formulas, represents a kind of knowledge interval where d represents 
what is known as true, and d' represents what is known as possible, all the 
rest being considered as implicitly false. A complete knowledge d can also 
be represented as [d] = [d, T] = OOd. 

query: a proposition whose atoms are either +q = K{+q), or —q = 
K{—q), or ?q = {-iK{+q) A ~<K{—q)), q being a non-modal formula: ex., 
(-k(A A -■B)V?C') A -D. 

In the following, we use only these new definitions of idioms. The following 
lemma characterizes the meaning of these idiomatics by relating them to the 
non-modal propositional logic. 

Lemma 3 Let \=c be a deduction relation on non-modal formulas (proposi- 
tions). For every non-modal formulas d, d', q 

1. [d,d'] hiCD-OAfc +9 iffd \=c q; 

2. [d, d'] \=kd-oNc -q iff dAq\=c a or d Ad' q; 

3. [d,d'] \=KD-OAfc otherwise. 



Proof 3 

(1) [d, d'] \=kd-oNc +q 0{Kd A N->{d A d')) \=kd-oMc KKq 

V(t(;, R) : (ru, R) ^ 0{Kd A N~'{d A d')) {w, R) ^ KKq (Definition 2) 

y{w',R) : (w',R) Kdand{w',R) \= N~'{d A d') ^ (wfR) \= Kq 
(semantics of O, K, and A) 

^ y{w',R) : R{w') C Wnid) and R{w') A Wnid A d') ^ R{w') C Wniq) 
(Lemma 1 ) 

^ Vi? : WR{d) C WR{q) (take R{w') = WR{d)) 
d \=c q (d,q are non-modal formulas). 

(2) similar to proof (1). 

(3) [d,d'] \=kd-oKC 0(iVdAiV-i(dAd')) \=kd-oNc -^KKqA^K^Kq 

V(w, R) : {w, R) ^ 0{Kd A N->{d A d')) (ru, R) ^ KKq and (w, R) ^ 
K->Kq (Definitions 2 and 1) 

'i{w,R) : (w,R) \= 0{Kd A N-'{d A d')) 3w'^ G R{w) : (w'i,R) ^ Kq 

and 3^2 G R{w) : {w' 2 , R) [= K (semantics of K and ~<) 
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\/w : 3('u;j, i?i) : i?i) \= Kd, {w'i,R\) |= N->{d A d'), {w'l, Ri) ^ Kq 

and 3 {w2, R 2 ) '■ (w' 2 , R 2 ) \= Kd, {w' 2 , R 2 ) \= N^{d A d'), {w' 2 , R 2 ) \= Kq 
(take Ri{w) = {w'\(w', Ri) \= Kd A N-'l^d A d')}, for i G {1, 2 } J 
^ 3(w{,Ri) : Ri(w{) C Wji,(d),Ri(w{) D (d A d'), i?iK) 2 WR,(q) 
and 3{w2,R2) ■ i? 2 ('wy C W R^{d) , R 2 {w' 2 ) 3 WR^{d A d') , R 2 {w 2 ) Q WR^iq) 
(Lemma 1 ) 

^ Vi? : WR{d) % WR{q) and WR^d) n WR{q) % 0 and WR{d A d') C WR^q) 
((^) use serialityforWR{d)r\WR{q) 2 0 und non-modality ofd,d',q; take 
R\{w'i) = WR{d) and i? 2 (w 2 ) = WVj(d) fl WR{q) which are non empty because 
of seriality) 

d V^c d o,nd d A q _L and d Ad' q 

[d, d'] '^kd-oNc +q and [d, d'] '^kd-oNC — <?• ■ 



Lemma 3 proves that the above idiomatics are exhaustive because a query q is 
always either true (+g), false {—q), or possible {Iq) with regard to a description 
in the form [d, d']. We see also that ? is disjoint from + and — , but a query can 
be both true and false in the special case where d \=c -L, i-C., the description 
is contradictory. In fact, in information systems, object descriptions are kept 
consistent by integrity constraints. Finally, idiomatics presented in this section 
offers a simple way to implement logic YAD-OM C by relying only on a non-modal 
propositional prover. 



4 Conclusion and Future Work 

Section 3.2 presents a generalized form of logic ON C that is parallel to stan- 
dard modal logics: a logic ONC is a modal logic extended with a new modal 
operator N that enables to reason on unaccessible worlds, whereas the usual 
modal operator K enables to reason on accessible worlds. Thus, as there are a 
whole family of modal logics depending on various conditions on the accessibil- 
ity relation (AR), we get a whole family of ON C-logics. Even if it is already 
known that logic ONC can be defined like a modal logic [RosOO] , to our knowl- 
edge only Y4L-ON C (transitive and euclidean AR) has been studied. In this 
paper, we have studied Y-ON C (any AR), then YD-ON C (serial AR), and we 
have showed they are more convenient for representing incomplete knowledge 
by enabling several levels of knowledge. Future work is to explore more deeply 
logic ONC both in its general and specific forms. 

Recently, a tableau calculus has been proposed for logic YAL-ON C [RosOO]. 
We think it would not be too difficult to extend it to any logic ONC by taking 
inspiration of what is done for modal logics with tableaux [Mas94] . 

In Section 1, we present the arbitrarity of the logic used as an important 
feature of our Logical Information Systems. A problem is that logic ONC sets 
the logic as soon as we want to represent knowledge. To combine ONC features 
with genericity, our idea is to build an abstraction of logic ONC by making the 
logic (£; ^£) appearing in Lemmas 2 and 3 a logical parameter. We call such an 
abstraction a logic functor, and we already did this work for several logics such 
as the propositional logic that we abstracted over atoms. An ONC logic functor 
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would allow to represent complete and incomplete knowledge as presented in 
this paper, but with the non-modal part of descriptions and queries expressed 
in a dedicated logic (e.g., regular expressions, intervals for dates, sets of valued 
attributes for the bibliographical application). 

Finally, we intend to study how to express integrity constraints in logic OM C 
itself [Rei92], and how to design declarative revisions and updates in order to 
integrate in the description of an object new knowledge facts about it, while 
preserving its consistency, and without having to edit it by hand. 

Acknowledgements. A warm thank goes to Olivier Ridoux for insightful dis- 
cussions and advices. His expertise in logic has been helpful to the highest point. 
I am also thankful to Philippe Besnard who make me know about logic All I 
Know and has supported my work. 
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Abstract. In this paper we deal with the SAT problem in many- valued 
logics which is of relevant interest in many areas of Artificial Intelli- 
gence and Computer Science. Regarding tractability issues, several works 
have been previously published solving polynomially some clausal many- 
valued SAT problems. Thus, our aim is to show that certain non-clausal 
many-valued SAT problems can be solved in polynomial time too, ex- 
tending in this way, earlier results from the clausal framework to the 
more general non-clausal one. 



1 Introduction 

Solving the SAT problem in many-valued logics is an important challenge due to 
the repercussions in many different areas of Computer Science such as Approx- 
imated Reasoning, Hardware Design, Deductive Data Bases, Automated Soft- 
ware Validation, Logic Programming, Knowledge Rule-Based System, etc. The 
interest of considering many-valued logics instead of classical logic lies mainly 
in the fact that many-valued logics can cope with certain uncertainty aspects 
existing almost always in real world applications. For a survey on Many- valued 
Automated Deduction issues the reader can see [15]. 

In this paper we will deal with the non clausal signed logic SAT problem. 
Signed logic is a kind of many-valued logic that is an extension of the classical 
logic in the following sense. Atoms in propositional bi- valued logic are noted by 
p and -'p. Knowing that the set of truth values is {0,1}, these atoms could be 
written differently as {1} :p and {0} :p respectively. Thus, in a general case, if 
N is the set of truth values an atom in signed logic is denoted by S :p, where 
S C N, and its negated by N/S:p. 

Regular logic is a particular case of signed logic with two assumptions 1) N 
is a total ordered set (0, ■ ■ • > 2) the set S can be either of 

positive polarity or negative polarity. Positive (resp. negative) polarity means 
that S takes the interval of values comprised between a given value j G N and 
1 (resp. 0 and the given value j). 

The satisfiability relation varies only w.r.t. the literal level. Namely, an in- 
terpretation I satisfies S : p iS I{p) G S and it satisfies a conjunction (resp. 
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disjunction) of formulas iff it satisfies each (resp. at least one) formula of the 
conjunction (resp. disjunction). 

The signed SAT problem keeps a particular relevance with respect to any 
other many-valued SAT problem. This is because in [13] has been proved that 
a SAT problem expressed in any finite many-valued logic can be transformed 
into an equivalent signed SAT problem in polynomial time. This means that a 
solver of the signed SAT problem can act as a general many-valued SAT solver. 
Indeed, in order to solve a many- valued SAT problem first one could transform 
the problem into a signed SAT problem and then applying the signed SAT solver. 
Thus, advances in solving the signed SAT problem have direct consequences on 
solving any finite many-valued SAT problem. 

Our work. Our purpose is to prove the polynomiality of the satisfiability 
problem in non-clausal regular formulas of kind F = A\ A A 2 A ... A A^ where 
the Z\’s are more general formulas than simple clauses. 

Thus, if we denote by D~ (resp C~^) a disjunction (resp. conjunction) of reg- 
ular literals with negative (resp. positive) polarity and DNF~ (resp. CNF~) a 
disjunctive (resp. conjunctive) normal form formula composed by regular literals 
with negative polarity, then the A’s elements of F are Negation Normal Form 
formulas of kind A = DNF~ V CNF~ V . 

For the previous class of formulas, we provide a refutation complete calculus 
and an efficient almost linear algorithm. Indeed, we prove that the mentioned 
problem can be solved in 0{n.log{n)) time. 

It can be noticed that the considered formulas present a Horn- like structure. 
Indeed the identified formulas are compact representations of Horn theories in 
the sense that they represent the same logical theory that could be represented 
by Horn formulas but, with considerable less symbols. In a favourable case, the 
reduction rate can be exponential. Thus, the identified formulas are of relevant 
interest for instance, in applications issued from the Knowledge Rule-Based Sys- 
tems where non-clausal formulas are a natural expression of the real problems. 

To solve the SAT problem in non-clausal form a known general principle is 
to translate the problem to a clausal form. However, this method is not exempt 
of severe drawbacks pointed out already in the literature related to this topic. 
Indeed two transformations are known, one preserve the logical equivalence and 
the other only the satisfiability equivalence. 

1. In the first case, the translation cannot skip the explosion of the number 
of symbols due to the A/V distribution operation and thus the size of the 
resulting CNF formula can increase exponentially. 

2. The other approach consists in modifying the formula by introducing artifi- 
cial literals [13] aiming at preserving the satisfiability relation. This second 
line of solution has two strong drawbacks: first, the logical equivalence rela- 
tion is lost which could be invalid for certain applications and second, the 
size of the derived formula increases polynomially [13] reducing significantly 
the efficiency of the approach. 

Hence, processing directly the non-clausal formula in an appropriated way arises 
as the most efficient approach of solving non-clausal many- valued SAT problems. 
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This paper is structured as follows. Firstly, we define the syntax and seman- 
tics of the non-clausal regular formulas dealt with here. Afterwards, we define the 
Logical Calculus. In section three, we give a quadratic correct algorithm. Next, 
we design an almost linear algorithm for the non-clausal regular SAT problem. 
Finally, we review the related work. 

2 Regular Many-Valued R-Formulas 

The first four definitions describe the syntax and semantics of regular formulas. 
A more detailed description about these concepts can be found in [8,12,14]. 

Definition 1. Signed formulas. Let N be a finite set of truth values, S a 
subset of N (S C N ) and p a proposition. An expression of the form S :p is a 
signed literal and S is its sign. Given a signed literal S:p and a set of truth values 
N , {N \ S):p denotes the complement of S :p. A signed clause is a disjunction 
of signed literals. A signed formula is a conjunction of signed clauses. 

Definition 2. Interpretation and satisfiability. An interpretation I is a 
mapping that assigns to every proposition a value in the set of truth values N . 
An interpretation I satisfies a signed literal S :p iffl{p) € S. An interpretation I 
satisfies a signed clause iffl satisfies at least one of its signed literals. A signed 
formula F is satisfiable iff there exists at least one interpretation that satisfies all 
the signed clauses in F . A signed formula that is not satisfiable is unsatisfiable. 
The empty signed clause □ is unsatisfiable and the empty signed formula T = {} 
is satisfiable. 

Definition 3. Regular sign. Let fi denote the set {j G N\j > i} and fi the 
set {j € N \j < i}, where N is the set of truth values, < a linear order on N 
and i € N. Lf a sign S is equal to either fi or fi, then it is a regular sign. A 
signed literal S:p has positive (resp. negative) polarity if S =fi (resp. S =fi). 

Definition 4. Regular formulas. Let R be a regular sign. A regular literal 
is a signed literal whose sign is regular. A regular clause C is a disjunction of 
regular literals C = R\ : pi V i ?2 : P 2 V . . . V Rm ■ Pm- A regular Horn clause is 
a regular clause with at most one regular literal with positive polarity. A regular 
unit clause is a regular clause containing only one literal. A regular formula is 
a conjunction of regular clauses. 

Now, we can describe the many- valued non-clausal formulas called F- 
formulas that are a non-clausal extension of the regular Horn formulas. They 
allow to represent the same theories that regular Horn formulas but with less 
number of literals. It can be easily proved that the reduction in the number of 
symbols can be of exponential rate. Thus, the proposed algorithm running with 
an almost linear complexity with non-clausal formulas is exponentially faster 
for the same problems than the polynomial algorithms [8,14] computing Horn 
regular formulas due to the exponential rate between the sizes of the inputs. 
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Definition 5. A negative disjunction, noted D~ = {]^i\ :piV : P2 V . . . V 
in'Pn), is a disjunction of literals with negative polarity. A negative conjunctive 
normal form, noted CNF~ , is a conjunction of negative disjunctions. A negative 
conjunction C~ = {f i\ '. p\/\ 4 - i2 : P2 A . . . A f in ■ Pn) is a conjunction of 
literals with negative polarity. A negative disjunctive normal form DNF~ is a 
disjunction of negative conjunctions. 



Definition 6. r’-formulas A subformula A is a disjunction of three optional 
terms A = DNF~\/CNF~\/C^ where DNF~ = Cf\/Cf"\/. . .VC~ is a negative 
disjunctive normal form, CNF~ = Df A Df A ... A D~ is a negative conjunctive 
normal form and C'^ = (fti : pi A . . . A : Pn) is a regular conjunction with 
positive polarity. A F-formula is a finite conjunction of formulas A. Fa denotes 
any F-formula containing the empty clause and hence it is unsatisfiable. 



Example 1. The formula here below is an unsatisfiable T-formula: 
r = {Z\i = (t0.7:pi), 

A 2 = (tO-6 : Pz), 

^3 = (tO-8 : Pe), 

Ai = ((40.2 : piA iO.l : p2)V iO.15 : pa)V 
(40.25 : P4V 4,0.4 : P 5 )A 4,0.2 : pf)y 
(t0.8:p7At0.7:p8)), 

^5 = 40.1 : ps)} 

The formula can be rewritten more intuitively, as a Knowledge Rule-based 
System (KBRS). Thus, Z\i to A3 are the facts, Z \4 is the unique implication rule 
and A3 comes from the query. Hence, Z \4 can be stated as: 

40.2 : piV tO.l : P2) A (fO.lS : ps) A ((fO.25 : P4A tO.4 : pg) V (fO.2 : pg)) 
^(t0.8:p7At0.7:p8)) 

It can be checked that Z \4 is equivalent to 8 regular Horn clauses or what is 
the same, to 8 simple impicational rules. For instance, two of them are: 

to. 2 : piA to. 15 : psA tO.25 : P4A tO.4 : pg ^tO.8 : pr 

tO.l : P2A t0.15 : pgA tO.2 : pg ^tO.7 : pg 



This simple example shows how our richer language enable to reduce up 
to an exponential order the size of the KBRS. Later on, we will show that the 
required time to solve the associated SAT problem can be reduced exponentially 
also compared to its counterpart clausal regular SAT problem [8,14]. 

3 Logical Calculus 



Here below we give the four inferences rules forming our logical calculus. The 
first three ones are generalisations of the Regular Unit Resolution (GRUR) to 
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our non-clausal Horn-like language and the fourth one is related to the And- 
Elimination rule of the Natural Deduction Calculus extended to the Regular 
language (RAE). Our regular non-clausal unit resolution rules can be seen either 
as generalisations of the regular clausal unit resolution rule [5,16] or as particular 
cases of the extension to the regular language of the non-clausal propositional 
resolution given in [17]. 

Definition 7. Logical Calculus. For i > j, the inference rules are: 



(t^ : P), ((hi : pA j. ji : P A ■ . ■ A 4. : p) V Cg V . . ■ V V CNF V C+) 

(Ca" V . . . V C71 V CNF- V C+) 



(GRURl) 



(t» : p), (((hi ■■ py jji ■■ py ■ ■ .y jjn ■■ p) A a ... a d^) v dnf v c+) ^ 

{{{Iji-.py ...y ijn-.p)AD^ A...ADin)yDNF-VC+) ^ 



(t^ : p), ((gj :p)ADf A...AD-)W DNF~ V C+) 
{DNF- V C+) 



(GRUR3) 



(tzi : Pi A . . . A tp : Pj A . . . A : Pu) 

As it can be checked, the GRURi rules simplify the sub-formulas because 
some factors are removed from them. The removals could transform a subformula 
A = CNF- V DNP- V C'^ in a positive formula A = C+. If that happens the 
RAE rule is applied. In the particular case where C"*" = {}, the unsatisfiability 
of the formula is detected. 

Theorem 1. Soundness. GRURl, GRUR2, GRUR3 and RAE are sound, 
namely F \-grurh^i,'z ,3 F' ^ F \= F' and F \~rae F' ^ F \= F' . 

Definition 8. Refutation. A refutation of a formula F, is a succession of 
formulas < Fi, F 2 , . . . , F^ > such that Fi = F,Fn = Fa and for each 1 < i < 
n — 1, Fi+i = Fi A Ai where Ai is a subformula deduced by the application of 
GRURl, GRUR2 or GRUR3 or a conjunction of unit clauses derived by the 
application of the RAE inference rule. 



Theorem 2. Refutation Completeness. F is unsatisfiable ^ F \- Fa- 

The next steps of this section intend to show that the Logical Calculus can 
be rewritten in a more algorithmic way with the twofold advantage: 

— there exists an efficient implementation of the Logical Calculus; 

— the correctness of the SAT algorithm is strightforward from the correctnes 
of the Logical Calculus. 




Extending Polynomiality 797 



Thus, rewriting progressively and appropriately the Logical Calculus, the proof 
of the correctness of the complex resulting SAT algorithm can be avoided using 
instead of, the correctness proof of the Logical Calculus which is quite standard 
and considerably easier. 

If some abstraction is introduced in the form of the subformulas appearing 
in the inference rules, the three GRUR rules can be rewritten in only two rules 
by merging GRURl and GRUR3 in the new GRURl rule as follows: 

Definition 9. Logical Calculus. 

(t* : p), ((ii : P A Z\i) V A 2 ),i > j \~guri A2 
(t* : p), (((Ij ■ P V Z\i) A Z\2) V As),i > j 'tgrur 2 {{Ai A Z\2) V A 3 ) 

Although the Logical Calculus given above is sound and complete, observe 
that the deduced subformulas should be copied and this provokes a computa- 
tional cost increasing quadratically with the number of Generalised Unit Resolu- 
tions. Thus, the next step trends to formalise the Logical Calculus algorithm by 
avoiding copies of subformulas. We note by r.{Ai ^ A 2 } the formula resulting 
of substituting in T a subformula Ai by another subformula Z\ 2 . 

Definition 10. Logical calculus. 

(ft : p),r,i > j \~guri r.{{ij :pAAi) V A2) ^ A2} 

(tz -.p),r,i>j ^gur 2 : P V Ai) A A2) V A3) ^ ((Ai A A2) V A3)} 

Notice that now the Logical Calculus expressed in the previous format allows 
to perform the inferences without adding any copy of the subformulas of the 
original formula. Contrarily to this, one can check that the size of the formula 
decreases. From a logical point of view, nothing has changed and thus refutation 
correctness is still warranted by the rewritten calculus. 

Looking closer at the previous calculi, one can see that it could be expressed 
in a more algorithmic. Thus, the next definition of the inference rules is closed 
on the one hand, to the previous Logical Calculus and on the other hand, to the 
first description of the SAT algorithm. 

Definition 11. Logical calculus. We note Remove(r ,A) a function that re- 
turns the formula F after removing its subformula A. 

(tz :p),F,i>j \~guri r ^ Remove{F, (} j : p A A)) 

(tz : p), T, z > j \~GRU2 F ^ Remove{F, (fj : p)) 



Example 2. A proof of the unsatisfiability of the formula in the first example is: 



C — {Ai, A 2 , A 3 , A 4 , A 5 } — 

= {(tO.7 : pi), A 2 , A 3 , (((tO.2 : piA tO.l : P 2 )V t0.15 : P 3 )V 
((10.25 : P 4 V tO.4 : p 5 )A }0.2 : pe) V (tO .8 : pyA tO.7 : ps)), A 5 } 

^ GRURl {(tO-7 : Pi), A 2 , A 3 , ((to. 15 : P 3 )V 

((tO.25 : P 4 V tO.4 : ps)A tO.2 : pe)) V (tO .8 : pyA tO.7 : ps)), A 5 }= 
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= {(tO.7 : pi), (tO.6 : pa), ^3, (40.15 : pa)V 

(40.25 : P4V iO.4 : p^)A |0.2 : pe)) V (fO.S : prA tO.7 : ps)), ^5} 

^GRUR2 {(to. 7 : Pi), (to. 6 : p 3 ),As, 

((40.25 : P4V tO.4 : ps)A {0.2 : pe) V (tO.8 : prA tO.7 : ps)), ^5} = 

=140.7 :pi), 40.6: Pa), 40.8 :pe), 

((40.25 : P4V {0.4 : p^)A {0.2 : pe) V ({0.8 : pyA {0.7 : Ps)), /ia} 

^GRURi (40.7 : pi), 40.6 : pa), ({0.8 : P6)({0.8 : prA {0.7 : ps), /is} 

^RAE {({0.7 : pi), 40.6 : Pa), ({0.8 : pe), ({0.8 : pr), ({0.7 : ps), /is} 

={({0.7 : pi), 40.6 : Pa), ({0.8 : pe), ({0.8 : pr), ({0.7 : ps), ({0.1 : pg)} 

^GRURi {({0.7 : pi), 40.6 : pa), ({0.8 : pe), ({0.8 : pr), ({0.7 : pg), ({0.1 : pg), □} 



4 Algorithm Description 

When the formula F has no positive subformulas then F is trivially satisfiable. 
So assume that some positive subformulas are present in F. Thus, applying 
the RAE rule over the positive subformulas in F produces positive literals 
({ j : p). Then, this leads to the statement that the formula F is satisfiable or 
otherwise, some unit positive literals can be deduced. Thus, the next step is to 
apply the GRURi inference rules with the unit formulas. This process is repeated 
until no more unit formulas are generated, or an empty clause is produced. In 
the first case, the formula is satisfiable and in the second unsatisfiable. 

The principle of the algorithm is the following. First the regular literals in 
the unit formulas are pushed in a stack (function Apply RAE (F, Stack)). For 
each regular literal in the Stack, the GRURi rules are applied iteratively ( While 
loop) . If as a consequence of the GRURi applications some subformulas become 
positive conjunctions, the RAE rule is applied adding new literals to the Stack. 
The process finishes when there are no more unit formulas in the stack or when 
an empty clause is deduced. In the first case, the formula is satisfiable and in 
the second it is unsatisfiable. 

Algorithm 1 

Apply-GRURi-RAE(r) 

1 ApplyRAE(T, Stack) 

2 While Stack 4 {} do: 

3 ({t : p) ^ pop(Stack) 

4 Remove all conjunctions ({j : p A A) s.t. i > j from F 

5 Remove all literals ({t : p) s.t. i> j from F 

6 ApplyRAE(r, Stack) 

7 EndWhile 

8 If {} € T then return (UNSAT) else return (SAT) 

The lines 4 and 5 are straightforward applications of the GRURI and GRUR2 
inference rules described in definition 11, and the lines 1 and 6 represent a direct 
application of the RAE rule defined in 7. 
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Theorem 3. Correctness. Alg. 1 returns “UNSAT” iff T is unsatisfiable. 

This proof is a direct consequence of the correctness of the Logical Calculus. 
The next goal is to optimise the complexity of the algorithm. For this purpose, 
we need first to precise the data structure. 

Data Structure. We note [X] a pointer to the object X. To each proposition, 
we associate two sets of pointers Neg.Cfp) and Neg.D{p). Each element in 
Neg.C{p) is a list of triplets (j, [C], [Z\]) such that {fj : p) is a regular literal 
in the conjunction C of the subformula A. Similarly Neg.D{p) stocks triplets 
{j, [D], [Z\]) corresponding to disjunctions containing a literal {fj : p). 

For each subformula A, a counter Counter. DNF(Z\) is provided. Each decre- 
ment of Counter. DNF(Z\) represents a removal of a falsified conjunction in A. 

For each disjunction D~ in the CNF~ terms, a counter Counter(D) is pro- 
vided. A decrement of Counter(D) indicates the removal of a falsified literal in 
D~ . If one of these counters is set to 0 means that the whole CNF~ term is 
falsified. 

Remark Notice that we do not need to know which conjunction has been re- 
moved, what we need to know is only how many conjunctions have been removed 
in order to detect when the whole DNF~ term has been removed. Similarly for 
the removal of literal in disjunctions D~ . 

Algorithm 2 

Apply-GRURi-RAE(r) 

1 V(t A: : g) G C+ G F push{f\k : g). Stack) 

2 While Stack yf {} do: 

3 (t* : p) ^ pop(Stack) 

4 V(j, [C], [Z\]) G Neg.C{p) do: If t > j then decrement Counter. DNF(Z\) 

5 V(j, [D], [A]) G Neg.Dfp) do: If i > j decrement Counter {D, A) 

6 V(j, [D], [A]) G Neg.nlp) do: 

7 If Counter{D, A) = 0 and Counter.DN F{A) = 0 then: 

8 V(tfc : q) G G F do: push{f\k : q), Stack) 

9 Endwhile 

10 If {} G F then return(UNSAT) Else return(SAT) 

Remark. For reasons of clarity, in the design of the algorithm we have 
assumed that the CNF~ term of each subformula A exists. The consideration 
of other cases is a mere question of implementation details regarding only line 

6 . 



It can be proved that the complexity of the algorithm is quadratic, but the 
algorithm is not correct yet. Actually, each decrement of the Counter.DNF(A) 
must correspond to the falsification of one conjunct of the DNF~ term. This 
counter should be set to 0 only when all the conjuncts are falsified. However, in 
the previous algorithm the deduction of n literals that could belong to a same 
conjunct of the DNF~ term implies n decrements of Counter. DNF( A). Thus, 
the counter could be set to 0, indicating that the DNF~ term has been removed, 
without having falsified all the conjuncts in the DNF~ . 
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To overcome this problem we use a flag call First(C) for each conjunct C~ 
in the DNF~ . This flag is set to True initially and after the first falsification 
of a literal in C“, the flag is set to False. In this way, only one decrement of 
Counter. DNF for each conjunct C~ in DNF~ is allowed. 

Thus, to correct the previous algorithm only its line 4 should be changed 
including the test of the flag First(C) as follows: 

4V(j, [C],[A]) e Neg.Cip) do: 

If i > j and First(C)= True 
then decrement Counter. DNF(Z\) 

First(C) ^ False 

Now, we can ensure the correctness of the algorithm. 

Theorem 4. Correctness. The algorithm Apply-RURi-RAE(F) returns “UN- 
SAT” iff F is satisfiable. 

The proof is a consequence of the correctness of the algorithm 1 that in turn 
is a consequence of the soundness and completeness of the Logical Calculus. 

Concerning the algorithms’ complexity, the last algorithm is of course more 
efficient than the previous one, but nevertheless, it is still quadratic. 

Theorem 5. The eomplexity of the previous proeedure is in 0{k ■ m), where k 
is the maximum number of subformulas including positive literals (f i ■ p) with 
the same proposition p and m is the maximum number of subformulas sharing 
negative literals {fj : p) with the same proposition p. 

5 An Almost Linear Algorithm 

The aim of the following optimisation is to design a strictly linear main proce- 
dure. The non-linear complexity factor will be confined to only the Pre-process 
step. 

Ordering Neg.C{p) and Neg.D{p). Once the Neg.C{p) lists are obtained, we 
sort them based on the i value from each pair (j,i : p) and in ascending order. This 
is done with a call to the well known procedure MergeSort, namely Neg.C{p) <— 
MergeSort{Neg.C{p)). An identical process is made with the Neg.D{p) list. 
Once the Neg.C{p) and Neg.D{p) lists are ordered, the removals of subformulas 
can be performed in a more efficient way. 

When a multi-valued literal {fi : p) is deduced, the first pointer (j, [C'“][Z\]) 
to a subformula A in Neg.C(p) is considered checking whether i > j. In the 
affirmative case, the pointer is removed from Neg.C{p), the counter decrements 
are executed and the same check is carried out with the second clause pointer 
in Neg.C{p). 

Apply-GRURi-RAE(T) 

1 Initialization(F) 

2 While Stack ^ {} do: 
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3 (t* : _p) ^ pop(Stack) 

4 while i > Val{First.conjunction{Neg.C(p))) and First(C)=Tme do: 

Remove First. con junction{N eg. C{p)) from Neg.C(p) 

Decrement Counter.DNF(Z\) 

First(C) ^ False 

5 while i > Val{First.disjunction{Neg.D{p))) do: 

Remove First.disjunction{Neg.D{p)) from Neg.D(p) 

Decrement Counter{D, A) 

6 If Counter{D, Z\) = 0 and Counter. DNF{A) = 0 then: 

7 V(tfc : q) G G F push{{'\k : q), Stack) 

8 Endwhile 

9 If {} G F return (UNSAT) else return (SAT) 



These operations are repeated till a certain check is negative and at that 
moment, the removal of pointers from Neg.C(p) is stopped. This process ensures 
that the list Neg.C{p) is revised at most once. Identical process for the list 
Neg.D(p) is performed. 

The definitive algorithm is given avove. 

The correctness of this algorithm follows directly from that of the previous 
algorithm. 

The Initialisation algorithm is not given because it does not present any 
particular difficulty: it only initialises the mentioned data structure to be used 
by the main procedure. The only point to be stressed is that its complexity is 
in 0{n ■ log{m)) which is the worst case complexity of the well-known algorithm 
MergeSort which is required to have initially ordered the lists Neg.C{p) and 
Neg.D{p). 

Theorem 6. The complexity of the main procedure is in 0(n). 

The proof follows from the previous explained optimisations of the steps of 
this procedure. 

Thus, the main procedure is strictly linear and the non-linear factor has been 
confined only to the initialization step. 



6 Related Work 

We review successively the main works concerning non-clausal tractability and 
many-valued tractability. 

Non-clausal Tractability. Tractability has attracted much attention, spe- 
cially in classical logic. As far as we know, the first published results concerning 
non-clausal tractability comes from [6,7,10] where a strictly linear bottom-up 
algorithm to test the satisfiability of a subclass of non-clausal formulas is de- 
tailed. Such a class embeds the Horn case as a particular case. In [11] a linear 
top-down algorithm is given for the same non-clausal subclass of formulas. 
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New results concerning non-clausal tractability are reported in [18] where a 
method called Restricted Fact Propagation is presented which is a quadratic, 
incomplete non-clausal inference procedure. 

More recently, in [19,20] a significant advance in non-clausal tractability has 
been accomplished. The author defines a class of formulas by extending the Horn 
formulas to the field of non-clausal formulas. Such extension relies on the con- 
cept of polarity. In [19], a SLD-resolution variant with the property of being 
refutationally complete is showed, although its computational complexity is not 
studied. In [20] a method for propositional and some many-valued non-clausal 
Horn-like formulas is described and it is stated that the method is sound, incom- 
plete and linear. However, concerning the last issue, no algorithm is specified, 
indeed the steps of the method are described as different propagations of some 
truth values in a sparse tree. Then, although it seems that the number of infer- 
ences of the proposed method is linear, it is not proved the resulting complexity 
(w.r.t. the number of computer instructions) of a linear number of truth value 
propagations on the employed sparse trees. 

In [1,4,3] some non-clausal SAT problems are proved to be strictly linear. 
These linearities are proved providing complete logical calculi and correct linear 
algorithms. 

Many- valued tractability. The earliest work on this topic is due to [8] where 
the SAT problem and other related problems for a sub-class of the Horn regular 
logic is proved to be almost linear. In [14] the regular Horn problem is proved to 
be also almost linear. Then, in [9] the 2-SAT problem is analysed proving that 
the regular 2-SAT and the special case of the signed 2-SAT in which all the signs 
are singletons are polynomial ones. In [5] the regular Horn SAT problem where 
the truth values form a finite lattice is proved to be polynomial. 

In [20] the first many-valued non-clausal SAT problem that can be deter- 
mined in polynomial time has been defined. Recently, another many-valued non- 
clausal SAT problem with a polynomial complexity has been identified [2] . The 
many-valued logic and the non-clausal form studied there are sub-cases respec- 
tively of the non-clausal form and the regular logic analysed in this paper. 



7 Conclusions 

Advances in the efficient solution of the many-valued SAT problem have impor- 
tant repercussions in many areas of Computer Science. In this paper we have 
proved that some non-clausal many-valued SAT problems can be solved effi- 
ciently in 0{n-log(n)) time. Thus we have generalised some existing results about 
clausal tractability to the more general non-clausal framework. The non-clausal 
formulas considered here could be of significant interest in applications because 
of they present a Horn-like structure. An important advantage of the proposed 
method is that it does not need to transform the original formula. Indeed, it 
processes the original formula preserving in this way all its logical properties 
contrarily to what happens when the formula is transformed to clausal forms by 
introducing artificial literals. 
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Abstract. This paper introduces a genetic algorithm for satisfiability 
problem in a probabilistic logic. A local search based improvement pro- 
cedure is integrated in the algorithm. A test methodology is presented 
and some results are given. The results indicate that this approach could 
work well. Some directions for further research are described. 



1 Introduction 

Probabilistic logics are used to reason about uncertainty expressed in terms of 
probability. Since the paper [19] many such logics have been developed (see [6, 
7,20,21,22], and the references given therein). In those logics classical proposi- 
tional language is expanded by expressions that speak about probability, while 
formulas remain true or false. Some of the logics allow making statements about 
higher order probabilities [7,21], while in the other [6,20,21] only probabilities of 
classical formulas can be expressed. In this paper we consider a logic of the later 
type which is the probabilistic logic about measurable events from [6]. A sound 
and complete axiomatization and a decision procedure for satisfiability problem 
are given there (see also [20,21] for an alternative approach). Satisfiability of a 
probabilistic formula can be reduced to linear programming problem. However, 
the number of variables in the linear system corresponding to a formula is ex- 
ponential in the number of primitive propositions from the formula. It makes 
any standard linear system solving procedure (Fourier-Motzkin elimination, for 
example) not suitable in practice when scaling up to larger formulas. That state- 
ment was already argued, for example in [8] where a procedure for probabilistic 
deduction which can be stopped at any time to yield partial information was 
suggested. 

Genetic algorithms (GA, for short) are general problem solving methods 
inspired by processes of natural evolution. First GA’s appeared in early 1970s 
and were rigorously stated in [13]. GA’s can be applied in many areas [2,10]. 
For example, GA’s are used to solve SAT, satisfiability problem for classical 
propositional logic (see [1,5,18], and the references given therein) which is an 
NP-complete problem. We note that pure GA’s are incomplete procedures for 
SAT. It is customary to combine GA’s with some heuristic procedures to obtain 
more powerful (but still incomplete) methods. Such an integration of a local 
search procedure into a GA for SAT is presented in [17]. 
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In this paper we try to apply similar ideas to attack satisfiability problem for 
the mentioned probabilistic logic which is also NP-complete [6] . We describe here 
the first step in developing of a satisfiability checker for the probabilistic logic 
based on the GA-approach as well as on a heuristic procedure for local search. 
If the checker finds that a formula is satisfiable, it also gives an actual model in 
which the formula holds. Our aim is to obtain an efficient checker. Thus, even 
if the underline procedure is semi decidable, it may still be preferable to a slow 
decision procedure which guarantees to find a solution if one exists. We use a 
set of formulas that are known to be satisfiable and measure the performance of 
the algorithm by the percentage of solved formulas. 

The rest of the paper is organized as follows. In Section 2 we give a brief 
description of the mentioned probabilistic logic. In Section 3 the paradigm of 
GA’s is presented, while in Section 4 we summarize how the general GA-approach 
is adapted to check satisfiability in the probabilistic logic. Section 5 contains 
some experimental results. We give concluding remarks and directions for further 
investigations in Section 6. 

2 Probabilistic Logic 

Starting from the set </> = {p, g,r, . . .} of primitive propositions the set of clas- 
sical propositional formulas is obtained in the usual way. We use a, to 

denote classical propositional formulas. A literal is a primitive proposition or a 
negation of a primitive proposition. A weight term is an expression of the form 
aiw{ai) -I- ... -I- a„w(a„), where Oi’s are rational numbers, and a^’s are classical 
propositional formulas. The intended meaning of w{a) is probability of a. A 
basic weight formula has the form t > c, where t is a weight term, and c is a 
rational number. Finally, the set of all weight formulas contains all basic weight 
formulas, and it is closed under Boolean operations. We use to denote 

weight formulas. The other forms of weight formulas can be defined as abbre- 
viations. For example, (t < c) ->{1 > c). An expression of the form t > c or 
t < c is called a weight literal. 

Let a be a classical propositional formula and {pi, . . . ,pk} be the set of all 
primitive propositions that appear in a. An atom of a is defined as a formula 
at = ±pi A ... A ±pk where ±Pi is used to denote either pi or -ip^. There are 2^ 
different atoms of a formula containing k primitive propositions. Let At denote 
the set {ati, . . . , at 2 k} of all atoms of a. Every classical propositional formula a 
is equivalent to formulas DNF(o;) and GDNF(a) = called disjunctive 

normal form and complete disjunctive normal form of a, respectively. We use 
at G GDNF(a) to denote that the atom at appears in GDNF(a). 

Semantics of the logic is given using models similar to Kripke models. That 
allows that probability formulas are not truth-functional, i.e., that w{a) > c is 
not equivalent to any truth-function of a. 

Definition 1. A probabilistic model is a structure M = {W, H, p,v) where: 

— W is a set of elements called worlds, 
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— H is a a-algebra of subsets of W, 

— fj, : H ^ [0,1] is a a-additive probabilistic measure, and 

— V W X <f ^ {T , 1S\ is a valuation which associated with every world w & W 

a truth assignment v{w) on the primitive propositions. 

The valuation v is extended to a truth assignment on all classical propo- 
sitional formulas. Let M = {W, H, ii,v) and a be a probabilistic model and a 
classical formula, respectively. The set {w & W : v{w){a) = T} is denoted by 
[o]m- a probabilistic model M is measurable if H contains sets of the form 
[o]m only, i.e., if [q;]m is measurable for every classical formula a and only sets 
of worlds definable by classical formulas are measurable. In the sequel we shall 
consider the class of all measurable probabilistic models, and omit the word 
measurable. 

Definition 2. The satisfiability relation ^ fulfills the following conditions for 
every probabilistic model M = {W, H, p,,v) : 

1. if a is a classical formula, M \= a if for every world w € W, v(w)(a) = T, 

2. M \= aiw{ai) -b . . . -b a„w(a„) > c if Ya=i aipL{[ai]M) > c, 

3. for every weight formula f,M\= ->f if M Y= f , and 

4- for all weight formulas f, and g, M\=fAg if M\=f, and M \= g. 

A set of formulas is satisfiable if there is a probabilistic model M such that 
M ^ A for every formula A from the set. A formula A is satisfiable if the set 
{A} is, while A is valid if for every probabilistic model M, M \= A. Every weight 
formula / is equivalent to a disjunctive normal form 

771 ki 

DNF(/) = V A + ■ • ■ + .) p^ Cij) (1) 

i=ii=i 

where disjuncts are conjunctions of weight literals, and pi is either > or <. Thus, 
/ is satisfiable iff at least one such conjunction of weight literals is satisfiable. 
Note that / can be transformed to DNF(/) in polynomial time. We say that 
a formula / is in the weight conjunctive form (wfc-form, for short) if it is a 
conjunction of weight literals. Now, Probabilistic Satisfiability Problem (PrSAT, 
for short) is the following problem: given a formula / in wfc-form, is it satisfiable? 

It is proved that PrSAT is decidable [6]. The main idea of the proof is that 
PrSAT can be reduced to linear programming problem. Namely, every weight 
formula / in wfc-form can be transformed to a system of linear equalities and 
inequalities containing: 

EateAtif) Kat) = 1 

p,{at) > 0, for every at G At(/), 
as well as an inequality of the form 

X)ateCDNF(ai) M(«^) + ■ ■ ■ + X)ateCDNF(a„) M(«^) P C 
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for every weight literal ait(;(ai) + . . . + anw{a„) p c which appears in / such that 
/ is satisfiable iff the system is. Note that any solution of the system defines a 
probability distribution over the set of atoms At(/). It is enough to guarantee 
that the probabilistic model whose worlds are identified with atoms from At(/) 
that hold in the worlds is measurable. In [6] it is proved even more, i.e., that 
PrSAT is NP-complete. That follows from the statement that a system of L linear 
equalities and inequalities has a nonnegative solution if it has a nonnegative 
solution with at most L entries positive such that the sizes of entries are bounded 
by a polynomial function of the size of the longest coefficient from the system. 

3 Genetic Algorithms 

GA’s use populations of individuals. Each individual (also called chromosome) is 
seen as a possible solution in the search space for the particular problem. Thus, a 
GA can be seen as a searching procedure for the global optima of the correspond- 
ing problem. Individuals are represented by genetic code over a finite alphabet. 
An evaluation function assigning fitness values to individuals has to be defined. 
Fitness values indicate quality of the corresponding individuals, while average 
fitness of entire populations may be good measures of obtained quality of the 
procedures. GA’s consist of applications of the genetic operators to populations 
that must ensure that average fitness values are continually improved from each 
generation to subsequent. Basic genetic operators are selection, crossover and 
mutation, but some additional operators such as inversion, local search, etc., 
may be used. 

Selection mechanism favourizes highly fitted individuals (as well as parts of 
genetic code of individuals, i.e. genes) to have better chances for reproduction 
into next generations. On the other hand, chances for reproduction for less fit- 
ted members are reduced, and they are gradually wiped out from populations. 
Grossover operator partitions a population into a set of pairs of individuals 
named parents. For each pair a recombination of their genetic material is per- 
formed with some probability. In that way nondeterministic exchange of genetic 
material in populations is obtained. Multiple usage of selection and crossover 
operators may produce that the variety of genetic materials is lost. It means 
that some areas of search spaces become not reachable. This usually causes the 
convergence in local optimums far from the global optimal values. Mutation op- 
erator can help to avoid this shortcoming. Parts of individuals (genes) can be 
changed with some small probability to increase diversibility of genetic material. 
An initial population is usually generated by random, although sometimes it 
may be fully or partially produced by an initial heuristic. A general description 
of GA’s is given in Figure 1, where Npop and pi denote the number of individu- 
als and their objective values, respectively. The objective value of an individual 
corresponds to the value which the individual owns in the case of the considered 
problem. The for-loop is repeated until a finishing criterion (the global optima 
is found, the maximal number of iterations is reached, . . . ) is satisfied. Since 
the procedure is not complete, if the maximal number of iterations is reached. 
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we do not know whether the considered problem is solvable. Heuristiclmprove- 
ment() can be optionally included to improve efficiency of GA and/or to help 
the procedure to escape from local optima. 



InputDataO; 

Populationinit 0 ; 

while ( not FinishedGA() ) { 

for ( i = 0 ; i < Npop i + +) Pi = ObjectiveFunction(); 
Heuristicimprovement 0 ; 

ComputeF itnesses ( ) ; 

Selection(); 

Crossover(); 

Mutation(); 

} 

OutputResults(); 



Fig. 1. A general description of GA’s 



The idea of caching is used to avoid permanent attempts to compute the 
same objective value [15]. It is especially important when computing is time 
expensive. If the value of an individual has to be computed, and it is already 
cached, we just read it from the cache. In this paper a simple but efficient 
strategy called Least Recently Used (LRU) strategy for caching is used. It is 
implemented by a hash-queue data structure which saves the individuals and the 
corresponding values. The queue size is a parameter which depends on memory 
size and other performance constraints. A basic description of LRU is given in 
Figure 2. It can bee seen that after LRU caches a value, the value cannot be 
removed until all other values in the cache have been used more recently. LRU 
replaces ObjectiveFunction() from Figure 1. 



i/ Belong) individual, CacheMemory ) Set Value) individual, GacheBlock ); 
else { 

Pi = ObJectiveFunction)); 

i/Full( GacheMemory ) Remove) GacheMemory, LRUGacheBlock ); 
Put) CacheMemory, pi ); 

} 



Fig. 2. Description of LRU strategy 



4 A Genetic Algorithm for PrSAT 

We have implemented our GA for PrSAT on top of a program which is a general 
GA’s simulator [16]. The input of the program is a weight formula / in wfc-form 
with L weight literals of the form ti p Ci. Without loss of generality, we demand 
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that classical formulas appearing in weight terms are in disjunctive normal form. 
Let </>(/) = {pi, . . . ,pn} denote the set of all primitive propositions from /, and 
|^(/)| = N. Recall that our goal is to find a probabilistic model M such that 
M 1= /. As we already noted, that model can be described as a probability 
distribution defined over the set of atoms At(/). 

An individual M from the population consists of L pairs of the form (atom, 
probability) that describe a probabilistic model. The first coordinate is given as 
a bit string of length N , where 0 at the position i denotes -ipi, while 1 denotes 
Pi. The second coordinate is a floating point number representing the probability 
of the corresponding atom. The internal representation of an individual is a bit 
string obtained by concatenation of the mentioned pairs. 

We define two evaluation functions that rank individuals. The first function 
(t) gives the total number of weight literals in / that are true for an individual 
M. If t(M) is equal to L, the individual M is a solution. The second function (d) 
measures a degree of unsatisfiability of an individual M. The degree is defined as 
the distance between left side values of the weight literals of the form a\w{a\) + 
. . Pi Ci that are not satisfied in the model described by an individual 

M, and the corresponding right side values: 

d{M) = 

Our program allows that the size of a population, and the types of genetic 
operators and the corresponding probabilities, etc. can be given as input param- 
eters. The main features of our GA are as follows: 

— the population consists of 10 individuals, 

— selection is performed using the rank-based operator, with the rank from 2.5 
for the best individual to 1.6 for the worst individual (the step is 0.1), 

— the crossover operator is one-point, with the probability 0.85, 

— the simple mutation operator is used with the probability 0.03, 

— the elitist strategy with one elite individual is used in the generation replace- 
ment scheme, 

— multiple occurrences of an individual is removed from the population, 

— as an additional finishing criterion a measure of population homogenity is 
used, and when that measure exceeds, the trial is finished, and 

— the LRU strategy with the buffer containing at most 5000 individuals is used 
for caching. 

The initial population can be generated in one of the following ways: 

— randomly or 

— for each individual ith atom is chosen such that it satisfies at least one 
classical formula appearing in fth weight literal. 

We use local search as a HeuristicImprovement() procedure in our GA. It consists 
of the following steps: 
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— for an individual M all the weight literals are divided into two sets; the first 
set (denoted B) contains all satisfied formulas, while the second one (denoted 
W) contains all the remained formulas, 

— from the set B the formula ts Pb cb (called the best one) with the biggest 
difference \p,{tB) — cb\ between the left and the right side is found, 

— similarly, from the set W the formula tw Pw cw (the worst one) with the 
biggest difference \p,{tw) — cw\ is found, 

— two sets of atoms are determined; the first set contains all the atoms 

from M satisfying at least one classical formula af from Ib = afw{af) + 
. . . + a^w(a^), while the second one bPAt(/) contains all the atoms from M 
satisfying at least one classical formula af' from tw = aYw{aY) + . . . + 

— the probabilities of atoms from \ VbAt(/) are changed such that 

tB Pb Cb remains satisfied, although the distance |/i(ts) — cb\ is decreased, 
and 

— the probabilities of atoms from WAt{f) \ BAt(f) are changed trying to satisfy 
tw Pw Cw- 

We have experimented with the following choices in the above local search pro- 
cedure: 

1. probabilities of only one atom from BAt{f) \ W^At(/) and only one atom from 
WAt{f) \ BAtif) are changed, 

2. probabilities of all atoms from i?At(/) \ W^At(/) and all atoms from WAt(f) \ 
i?At(/) are changed, 

3. the procedure is applied on the best individual M only, 

4. the procedure is applied on all individuals from the population, 

5. the procedure is performed only once, and 

6. the procedure is repeated until no improvement of degrees of unsatisfiability 
is obtained. 

As a starting point of our work, a program which randomly generates satis- 
fiable weight formulas in wfc-form (with classical formulas in disjunctive normal 
form) was developed. It allows us to measure the success rate of the algorithm, 
i.e. the percentage of solved problem instances. The corresponding input pa- 
rameters are: an integer used as a seed for initialization of the random number 
generator, the number of primitive propositions N, the number of atoms L, the 
maximal number S of summands in weight terms, and the maximal number D 
of disjuncts in disjunctive normal forms of classical formulas appearing in weight 
terms. 

Two kinds of the problem instances are generated. In the first one, having 
the above parameters, the program randomly generated L atoms and their prob- 
abilities (with the constraint that the sum of probabilities must be equal to 1) 
representing a model M. Next, a weight formula / containing L basic weight for- 
mulas is generated. It contains primitive propositions from the set {pi, . . . ,pn} 
only. Every weight literal contains at most S summands in its weight term. Every 
classical formula is in disjunctive normal form with at most D disjuncts, while 
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every disjunct is a conjunction of at most N literals. For every weight term co- 
efficients are chosen, and the probability of the weight term is computed. Next, 
the sum sp{t) of positive coefficients and the sum sn{t) of negative coefficients 
are computed. Finally, the right side value of the weight literals between sp(t) 
and sn{t), and the relation sign are chosen such that M ^ /. In the second 
kind of the problem instances, there is an additional parameter S. For every 
weight term t the probability pM(t) is computed. A lower and a upper bound 
(lb{t) = — S and ub{t) = pM{t) +<5) of the probability of t are chosen, and 

two basic weight formulas of form t > lb{t) and t < up(t) are produced. In that 
way a weight formula with 2L basic weight formulas is obtained. 

A description of our file format for PrSAT problems as well as the problem 
generator and used problem instances can be found at www.mi.sanu.ac.yu/~zora- 
no/prsat/prsat.html. 

5 Experimental Results 

All of our experiments were done under Linux operating system running on 
IBM-PC compatible computers with an Intel processor (Pentium III/800MHz), 
and 256MB of RAM. The 10 problem instances of the first kind were generated 
for N =15 and L= 15, iV = 15 and L = 30, A^ = 30 and L = 30, A^ = 30 and 
L = 60, and A^ = 60 and L = 60. For every instance S = 5, and D = 5. For 
every fixed combination of N and L there were 2 different instances numbered 
from 1 (for A^ = 15 and L = 15) to 10 (for A^ = 60 and L = 60). Also, the 3 
problem instances of the second kind were generated for = 15 and 2L = 16, 
A = 15 and 2L = 30, and A = 30 and 2L = 30, numbered from 11 to 13. In 
those instances S = 0.1. 

We investigated several variants of our algorithm. They differ in the mutation 
probability and the type of the applied local search procedure. Abbreviations of 
the various tested variants of our GA, as well as the corresponding parame- 
ters, are given in Table 1. ’2, random’ in the column ’local search procedure, 
atoms’ means that the probabilities of only one randomly chosen atom from 
BAt(f) \ WAt(f) and only one randomly chosen atom from WAt{f) \ ^At(/) are 
changed, ’repeat’ in the column ’local search procedure, performed’ means that 
the procedure is repeated until no improvement of degrees of unsatisfiability is 
obtained. Since the number of trials required to find a solution can vary, all the 
presented results represent data average over 5 independent trials. The maximal 
number of generations was set to be 2000. The results are summarized in tables 
2 and 3. 

Table 2 contains results for the first set of the problem instances (numbered 
1 to 10), and Table 3 contains results for the second set of the problem instances 
(numbered 11 to 13). Each table entry contains the number of successful trials 
(if it is not 5), the average number of generations and the average running time 
(in seconds) in trials where a solution was found. 

Comparing the performance of the variants shows that the algorithm based on 
the local search procedure without the mutation operator give the worst results. 
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Table 1. Parameters of the tested variants of GA 





mutation 


initial 

population 


local search procedure 
individual(s) 1 atoms performed 


GA 


+ 


random 


not applied 


CHAAR 


+ 


chosen 


all 


all 


repeat 


CHRA 


+ 


chosen 


all 


2, random 


only once 


CHAA 


+ 


chosen 


all 


all 


only once 


CHRAR 


+ 


chosen 


all 


2, random 


repeat 


RHAAR 


+ 


random 


all 


all 


repeat 


RHRA 


+ 


random 


all 


2, random 


only once 


RHAA 


+ 


random 


all 


all 


only once 


RHRAR 


+ 


random 


all 


2, random 


repeat 


RHBA 


+ 


random 


the best one 


all 


only once 


CHBA 


+ 


chosen 


the best one 


all 


only once 


RHBR 


+ 


random 


the best one 


2, random 


only once 


CHBR 


+ 


chosen 


the best one 


2, random 


only once 


RHBAR 


+ 


random 


the best one 


all 


repeat 


CHBAR 


+ 


chosen 


the best one 


all 


repeat 


RHBRR 


+ 


random 


the best one 


2, random 


repeat 


CHBRR 


+ 


chosen 


the best one 


2, random 


repeat 


RHABRWM 


- 


random 


all 


2, random 


repeat 



Table 2. Results on the first set of problem instances 





1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


GA 


1/0.01 


1/0.01 


186/2.81 


12/0.22 


4/0.19 


33/1.6 


22/3.87 


749/136.6 


3/1.71 


21/13.57 


CHAAR 


1/0.14 


1/0.19 


25/29.56 


4/3.77 


1/5.68 


1/4.75 


18/371.7 


74/1827,92 


1/95,82 


1/196,86 


CHRA 


2/0.03 


3/0.04 


600/29.97 


10/0.6 


4/0.5 


14/2.27 


32/17.88 


3/961/581.85 


2/3.93 


15/30.75 


CHAA 


1/0.06 


1/0.07 


88/112.92 


4/1.4 


4/2/1.12 


6/78.5 


8/91.72 


271/2710-48 


1/27.55 


4/136.42 


CHRAR 


2/0.02 


2/0.02 


197/11.29 


9/0.5 


5/0.87 


6/1.1 


30/18.67 


3/1088/707.8 


2/3.76 


8/16.07 


RHAAR 


1/0.01 


51/0.01 


40/47.64 


4/3.02 


1/3.73 


3/11.72 


7/149.93 


30/784.78 


1/85.41 


1/123.11 


RHRAR 


1/0.01 


1/0.01 


205/12.0 


8/0.44 


2/0.44 


12/2.55 


17/10.23 


3/546/360.14 


2/5.2 


14/28.74 


RHAA 


1/0.01 


1/0.01 


83/27.91 


8/2.65 


1/1-3 


4/5.85 


10/124.72 


883/9045-75 


1/34.84 


2/74.55 


RHRA 


1/0.01 


1/0.01 


179/8.44 


9/0.5 


3/0,46 


10/1.66 


12/6.86 


3/269/160-87 


2/3.63 


11/22.46 


RHBA 


1/0.01 


1/0.01 


315/13.77 


10/0.65 


1/0.16 


21/5.0 


14/10.61 


1/316/422.35 


1/2.67 


6/22.11 


CHBA 


3/0.03 


4/0.04 


298/15.53 


14/0.85 


4/0.54 


66/9.18 


24/28.05 


3/415/542.12 


2.4.6 


7/39.98 


RHBR 


1/0.01 


1/0.01 


330/6.07 


10/0.25 


7/0.36 


36/2.1 


17/3.61 


4/980/175.53 


2/1.46 


22/17.55 


CHBR 


3/0.02 


4/0.02 


519/9.61 


8/0.18 


4/0.23 


45/2.65 


38/8.17 


3/831/185.80 


3/2.1 


16/12.7 


RHBAR 


1/0.01 


1/0.01 


191/10.24 


8/0-66 


1/0.28 


31/10.06 


18/19.62 


2/740/1341.7 


1/7.63 


8/61,7 


CHBAR 


3/0.03 


3/0.04 


93/4.77 


8/0.76 


6/1.7 


31/9.66 


26/30.57 


1/664/1004.68 


2/17.75 


9/75.15 


RHBRR 


1/0.01 


1/0.01 


353/6.53 


8/0-17 


4/0,21 


47/2.7 


22/4.88 


3/237/53-4 


2/1.76 


22/17.78 


CHBRR 


2/0-01 


4/0.02 


263/4.89 


14/0.29 


5/0,28 


17/1.05 


30/6.54 


803/179-8 


3/2.47 


19/15,03 


RHABRWM 


1/0.01 


1/0.01 


1/21/1.15 


4/10/3.98 


4/0,61 


4/8/1,46 


2/45/23-04 


0 


2/3.64 


3/40/76.23 



Almost all unsuccessful trials of that variant finish because of the premature 
convergence (i.e. a high population homogenity is obtained). On the other hand, 
the pure GA, as well as various combinations of the local search variants and the 
GA are more successful. An interesting observation is that the success rates in 
those variants are more or less similar. It means that in almost all cases neither 
the way in which the initial population is generated nor the chosen variant of the 
local search procedure influence the results significantly. However, the variants 
that apply the local search procedure on the best individual only, perform worse 
than the other variants while solving the problem 8. 
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Table 3. Results on the second set of problem instances 





11 


12 


13 


GA 


236/1.04 


3/963/14.66 


1/1055/46.51 


CHAAR 


348/3.36 


3/1298/42.49 


2/945/91.79 


CHRA 


230/3.28 


1/1681/82.58 


3/1557/227.17 


CHAA 


384/3.7 


0 


2/880/85.9 


CHRAR 


286/6.69 


3/1063/52.13 


1/1359/198.26 


RHAAR 


182/1.75 


3/1104/36.2 


3/1246/121.3 


RHRAR 


318/4.5 


3/1430/69.98 


3/1293/188.21 


RHAA 


232/2.25 


3/1091/35.66 


2/1111/108.33 


RHRA 


151/2.12 


2/1338/64.24 


2/1725/252.09 


RHBA 


174/0.84 


2/963/16.37 


4/1669/82.26 


CHBA 


358/1.73 


2/1138/19.33 


0 


RHBR 


191/1.01 


4/1185/22.1 


0 


CHBR 


287/1.5 


3/1250/23.31 


2/1411/76.57 


RHBAR 


364/1.77 


2/590/9.87 


2/1279/63.22 


CHBAR 


428/2.08 


2/1136/19.18 


2/1242/61.55 


RHBRR 


288/1.52 


2/1025/19.01 


0 


CHBRR 


683/3.62 


4/1030/19.17 


2/1206/65.28 


RHABRWM 


0 


0 


0 



The pure GA is the fastest variant because it does not contain the local 
search procedure. However, Table 2 shows that enriching our GA with the local 
search procedure increases the success rates at the cost of more evaluation. In 
the column T3’ which corresponds to the biggest instance the pure GA have only 
one success, while the majority of the enriching GA’s have better results. We 
expect that on the larger test cases the difference between those two approaches 
will be even greater. 

6 Conclusion 

We have described a genetic algorithm for solving satisfiability problem in a 
probabilistic logic. Since the problem is NP-complete, we have tried to develop a 
semi decidable but fast procedure, and to make a tradeoff between completeness 
and computation time. To the best of our knowledge, this is the first paper which 
studies PrSAT using such an approach. Although it is clear that far more tests 
and an exhaustive study should be done, our preliminary results indicates that 
the genetic approach for PrSAT works well. 

Besides testing of numerous problem instances and tuning parameters of the 
algorithm, there are many other directions for further investigations. In the case 
of SAT, as a general method for increasing quality of GA’s, researchers developed 
algorithms that redefine their fitness functions while running [1] to guide search 
in the right directions. It is interesting to see whether such an idea is suitable 
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for PrSAT. Since GA’s are semi decidable procedures, we can ask what can 
we do if we do not know whether a given formula is or is not satisfiable, and 
a maximal number of generations is reached. One possibility is to decrease the 
allowed number of steps and to restart the procedure from another random initial 
point instead of continuing an unsuccessful search. Obviously, the problem is to 
determine when to restart. It is already reported that in the case of classical 
propositional formulas in conjunctive normal form, each clause consisting of 3 
literals, particularly hard instances for satisfiability problem have a ratio of 4.3 
for clause/literals [4]. Although there are more parameters that are important in 
the case of PrSAT and formulas in wfc-form, it would be interesting to explore 
whether such an phase transition phenomenon exists there. Also, it must be 
checked whether our tests generation methodology is fair and adequate. 

In summary, we feel that the experimental results reported here are encour- 
aging, and that the future work along these lines should yield new insights into 
the field of probabilistic satisfiability. 

A note added in the proof. We have been informed by the anonymous referee 
about papers [3,9,12,14]. In [9,12,14] the powerful column generation technique 
of linear programming were applied to PrSAT (which was denoted by PSAT 
in those papers). In [3] a version of genetic algorithms was used to calculate 
intervals of propagated probabilities in causal networks. We plan to consider 
those results and compare them to ours in a future work. 
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